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INTRODUCTION. 


THK OBJECT OF CONFLUENCE ANALYSIS. THE DANGER OF IN- 
CLUDING TOO MANY VARIATES IN A REGRESSION ANALYSIS. 

In a paper ’’Correlation and Scatter...” in ’’Nordic Statisti- 
cal Journal” 1928 I drew attention to the fact that in statistical 
regression analysis there exists a great danger of obtaining 
nonsensical results whenever one includes in one and the same 
regression equation a set of variates that contain two, or more, 
subsets which are already — taken by themselves — highly 
interoon^lated. Suppose, for instance, that we have three sta- 
tistical variates x.,, x^ (measured from their means), and tliat 
we know for apriori reasons that there e.xist not only one but 
two independent linear equations between them (since the vari- 
ates are measured from their means, we may assume the equa- 
tions to be homogeneous). Further, suppose that a great number 
of observations are made, each observation giving the values 
of the three variates and being represented as a point in the 
three dimensional x.,, x,) space. All these observation points 
would tlien lie on a straight line through origine in (.r^, x,,, x.^) 
space. From the distribution of these points it would be absurd 
to try to determine the coefficients of any of the two equations 
tliat we know apriori should exist between the variates. Indeed, 
a set of points lying in a line does not contain enough informa- 
tion to determine a plane. More precisely the coefficients of 
this plane would contain a one dimensional indeterminateness. 
In this situation any attempt at determining from the available 
data a regression equation involving three variates would be 
sheer nonsense. 

If nevertheless such an attempt was made, the regression 
coefficients would — if enough decimal places were carried in 
the computation — turn out to be of the indeterminate ^ form. 
If errors of observation are present, the regression coefficients 
would not be exactly of the form JJ, but would now appear in 
the form of an error of observation divided by another error of 
observation. On the face of it the result of the computation 
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would thus be determinate but it would still have no meaning 
as an expression for a regression coefficient. We would have 
fictitious determinateness created hy random errors. 

When several variates are included in the analysis, the situa- 
tion becomes of course much more complex. We may here 
encounter a whole hierarchy where some of the variates may 
form a set where a regression equation has a meaning, and 
others forming sets where such equations have no meaning, 
The study of this hierarchy is what I call confluence analysis. 
It is an important part of statistical analysis, particularly in 
the social sciences. Indeed, the data will frequently obey 
many more relations than those which the statistician happens 
to think of when he makes a particular regression study. If 
the statistician does not dispose of an adequate technique for 
the statistical study of the confluence hierarchy, he will run the 
risk of adding more and more variates in the study until he 
gets a set that is in fact multiple collinear and where his 
attempt to determine a regression equation is therefore absurd. 

In practice these cases are apt to arrive much more fre- 
quently than is usually recognised. As a matter of fact I 
believe that a substantial part of the regression and correlation 
analyses which have been made on economic data in recent 
years is nonsense for this very reason. 

The sampling theory as we know it today does not furnish 
criteria that can distinguish between these various cases. 
Indeed, the standard errors on the regression coefficients, and 
most of the other parameters used in sampling theory become 
themselves indeterminate in those cases which it is here wanted 
to analyse, or more precisely expressed they get fictitious 
determinateness created by random errors. I indicated this 
theoretically in the above mentioned paper in Nordic Statistical 
Journal of 1928, and the example discussed in Section 33 of the 
present paper will illustrate clearly how inadequate the tools of 
sampling theory are for the study of the problems which we 
encounter in confluence analysis. 

Of course, this contains no reflection on the value of sampling 
theory in general. In problems of the kind encountered when 
the data are the result of experiments which the investigator can 
control, the sampling theory may render very valuable ser- 
vices. Witness the eminent works of R. A. Fisher and Wishart 
on problems of agricultural experimentation. 
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In the 1928 paper "Correlation and Scatter ..." I made a 
first attempt at developing^ statistical criteria for the various 
cases of confluence hierarchy. The criteria were based on a 
study of the smallness of the correlation determinants in the 
various subsets. The square roots of these determinants I 
termed scatter coefficients. In practice one will most frequently 
work with the determinant values themselves, which may 
perhaps be called scatterances 

The weak point in the method I suggested in 1928 was that 
no criteria were developed for judging the significance of the 
scatterances deviation from zero. In the subsequent years I 
reverted to the question on and off, on various occasions, 
attempting to push the analysis further. The line of approach 
which suggests itself from the view-point of sampling tlieory 
is to attempt to find the sampling distribution of the scatter- 
ances. I did not concentrate much on this aspect of the 
problem, primarily because I felt that — at least when the data 
are of an economic sort — this would not be the most fruitful 
way of approach. Indeed, if the sampling aspect of the pro- 
blem should be studied from a sufficiently general set of 
assumptions, I found that it would lead tO' such complicated 
mathematics that I doubted whether anything useful would 
come out of it. And, on the other hand, if the sampling aspect 
should be studied under simple assumptions, for instance, of not 
collinear and normally distributed basic variates, the essence of 
the confluence problem would not be laid bare. One would then 
again get back to such a situation where those higher para- 
meters (standard errors, etc.), by which the first set of para- 
meters (scatterances, etc.) are to be judged, themselves become 
indeterminate in just those cases that interest us from the con- 
fluency view-point. One would have to consider standard 
errors of the standard errors, and standard errors of these 
higher standard errors again, and so on, up to such a high level 
of the standard error hierarchy, that the utility of this whole 
apparatus would become very dubious. 

I decided therefore first to attack the problem more from the 
experimental side, working out numerically — on actual econo- 
mic data as well as on constructed examples — various other 

‘ This term was suggested by Mr. Maurice H. Belz, Lecturer at the Uni- 
versity of Melbourne, who etudied methods of oonfluence analysis at the 
University Institute of Eoonomice and in my Statistical Seminar in Oslo 1933. 
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types of criteria which intuitively and heuristically may 
suggest themselves. These experiments converged towards a 
definite method which, after applications to various kinds of 
data, was found to give satisfactory and plausible results. The 
present paper gives an account of this work. 

Part I describes some of the procedures of confluence 
analysis with which I experimented before reaching the more 
satisfactory method. While these tentative procedures describ- 
ed in Part I do not, taken by themselves, give final, conclusive 
criteria of confluency and linear significance, they are not 
wholly without interest because they exhibit the nature of the 
difficulties involved, and may — when used with care — help 
making a rough preliminary analysis of the data. 

Part II gives the theoretical background of the further 
analysis. It discusses the distinction between systematic varia- 
tions and disturbances and from this draws certain conclusions 
regarding the ’’true” regression. In particular, there are point- 
ed out certain facts regarding the connection between the 
’’true” regression and the empirical results obtained by least 
square minimalisation in different directions. These facts give 
the leading ideas of the subsequent method. In this connection 
is also outlined a general scheme of interpretation for the 
various principles of determining linear regressions which are 
in common use or may suggest themselves as plausible. 

Part III develops the method which I am now recommending 
as the most conclusive and which I feel gives a rather satisfac- 
tory solution of the main problems of linear confluence analysis. 
The computing technique to bo used in this connection is 
described in Section 15 and the essence of the principles of 
interpretation are developed in Sections 16—18. Those who 
are primarily interested in results will probably find these four 
sections the most important in the present work. 

Part IV gives numerical examples illustrating in detail the 
application of the technique proposed. One of these examples 
is based on constructed data, in order to give a means of check- 
ing how the tests proposed work in practice. This example 
also shows how utterly inadequate the usual sampling error 
analysis is as a means of testing significance when the data are 
linearly confluent. The other examples are drawn from actual 
data, particularly American consumption and sales statistics. 



As a by-product in the study of these latter examples is 
obtained — by means of two different reference commodities 
(meat and butter) — a determination of the money flexibility 
that may be compared with the results which I found in 1930 
by using U. S. budget data. 

The present study has been undertaken as an indispensable 
preliminary step for certain projects, namely statistical produc- 
tivity studies and statistical construction of econometric func- 
tions (demand and supply curve and the like) that are planned 
as part of the research programme of the University Institute 
of Economics, Oslo. It is to be hoped, however, that the re- 
sults here presented may be applicable, not only to our special 
problems, but more generally to various kinds of problems 
where the statistical confluency of the data is an important 
feature to take into consideration. 

The amount of numerical work involved in the present study 
has been extraordinarily great. It would have been entirely 
impossible to carry it through if I had not had at my disposal 
the trained staff of computers now working at the University 
Institute of Economics. This Institute was established through 
generous grants from the Rockefeller Institution, New York 
and A/S Norsk Varekrig, Oslo. As directors of the Institute my 
colleague, Professor Wedervang, and I take this opportunity 
of extending our sincere thanks to these Institutions for the 
support received. 




PART I: CONFLUENCE ANALYSIS BY MEANS OF TEST- 
PARAMETERS. 

1. CORRELATION COEFFICIENTS AND SCATTERANCES. 

Most of the work in linear regression analysis can be based 
on cross moments and correlation coefficients. If Xy...Xn are 
the observational variates measured from their means, and the 
Gaussian symbol [ ] denotes a summation over all the observa- 
tions, the moments are 

( 1 . 1 ) mij = [x,Xj] 

and the ordinary (gross) correlation coefficients are 


( 1 . 2 ) 


Uj-- 




The standard deviations of the variates are 


where N is the number of observations. The standardised variates 
are 



The standardised variates have unit standard deviation. If the 
variates are normalised in such a way that their surnsquare 
over all the observations (not their standard deviation) is unity, 
we get a set of variates whose cross moments are the same as 
their correlation coefficients. Of course the difference between 
the variates normalised in this way and the standardised 
variates only lies in the factor 
I am here taking the correlation coefficients (1. 2) simply as 
a set of computational parameters without making any attempt 
at interpreting them as a measure of the strength of the rela- 
tion between the variates. 
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In practical work it will be found more convenient to handle 
the correlation coefficients than the moments, particularly 
because the former are reduced to a common order of magni- 
tude with the same meaning of the decimal places in all the 
magnitudes treated. 

In the computation of moments and correlation coefficients 
we have at the Oslo Institute developed certain checks which 
we have found very useful, but which are not, as far as I 
know, commonly practised. It may therefore be worth while 
to mention them in the present connection. We introduce — 
as in the usual checking technique — the sum variate 

(l.T)) X„. 

But instead of computing all the crossmoments and veri- 
fy in, a; for each i that 

(1. (!) «,(, = Wii + . . . + 

we simply take the sumsqnare and verify that 
(1.7) m„„ = mji+ . . . + m„„+ 2lmy 

i<3 

where the summation X means the sum of all the ( 2 ) cross- 
» <1 

moments that express interconnections between different vari- 
ates. 

If it is wanted to apply a check that can locate an error to 
a smaller section of the work, we rather prefer to split the 
series of observations into ranges, say of 10 or 20 observations 
in each range, and then apply (1. 7) to each range. 

Generally we form these ran go- moments before reducing the 
variates to measurement from their means. The only thing we 
do to the raw data before computing the moments is to add 
zeros to some of the variates in order to make all of them, 
roughly speaking, of the same order of magnitude, and some- 
times to subtract a provisoric mean in round numbers if the 
figures are such that this can be done very quickly. From 
these modified variates we form the sum variate (1.5), and 
compute its sum 8^ over all the observations. This sura is 
checked against the sums 8^ of the individual variates. (If 
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necessary this checking may also be split into ranges.) A 
listing adding machine is convenient for this purpose. 

The origine moments of the variates modified by the above 
procedures may be denoted 

( 1 . 8 ) M,y 

These are the crossmoments that are formed and checked for 
each range by (1. 7). The total moments derived from the range 
moments are also checked by (1. 7). This being done, all the 
rest of the work is only concerned with total results, not with 
range results. 

The ’’multiplied” mean moments 

(1.9) 

are formed (each quantity Sij being computed in one operation 
on the machine). The results are checked by computing tho 
row sums 

( 1 . 10 ) = % + • • • 

and verifying that the sum of these row sums is equal to S/q 
computed directly by (1.9). This check is equivalent to apply- 
ing (1.7); of course (1. 7) also holds for the 
It will be noticed that (1. 7) involves much less extra multi- 
plication than (1. 6), and (1. 7) is just as safe, with one excep- 
tion, namely that (1. 7) does not register a mistake in inter- 
changing two (or more) of the moments. Particular attention 
should therefore be given to the correct location of the figures 
in the tables. A good safeguard against mistakes of this sort 
is to compute each crossmoment, for instance through for 
all ranges before a new crossmoment is taken. This means that 
all checking by (1. 7) is left to the end of the moment work, 
and then all range-moments recomputed that does not check 
right away. 

In terms of the Sij the correlation coefficients are 


( 1 . 11 ) 


S/s, 


a ^JJ 


Tlie dividing out by the factors in the numerator of (1. 2) 
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or (1. 11) is usually — so far as a systematic check is concern- 
ed — a weak point in correlation computations. We check this 
part of the work as follows. First the numbers and 1/ 

(i=: 1, 2 . . . are computed and checked individually by squaring 
and multiplication. Then the correlation coefficients are com- 
puted by (1. 11) and the row sums 

( 1 . 12 ) = + 

are formed. By means of these it is checked that 

( 1 . 13 ) 

where Siq are the row sums (1. 10) previously computed. 

If (1. 13) does not check right away, each row in the corre- 
lation matrix may be checked separately by 

(1.14) 

The checks (1. 13) and (1. 14) contain of course also a verifica- 
tion of the rootsquaring of the and on the divisions 
1 / ^ Sii, so that actually all steps are checked by this method. In 
practical work with several variates tliis checking technique 
has been found very helpful. 

In all the matrices here considered Stp etc. only the 
diagonal and one of the triangles need to be filled in, since the 
matrices are symmetric. As a rule we use the north-east 
triangle. Taking a row sum (or a column sum) in such a 
matrix means taking the sum of the elements in a broken line 
reflected under 45° on the principal diagonal. 

The correlation determinants, i. e. the scatterances, we denote 


(1-15) A = 

'•»! ^2 • • • '•«» 

In the subset {ij ... A;) we use the notation 



15 


(1. 16) 

A ... * — 

rurij.. 

‘^ik 

•rj. 



^ki ^kj • 



These determinants in all possible subsets are computed most 
easily by the tilling technique described in Section 16. 

The diagonal elements in (1. 16) and (1. 16) are all equal to 
unity. If these diagonal elements are replaced by zeros, we get 
the ’’hollow” correlation determinants. 


(1. 17) 


r = 


0 . . . r,„ 

>-„i ... 0 


and similarly Fij 


These hollow determinants are convenient for certain computa- 
tion purposes. 

For very large n an approximation to the scatterance A 
may be computed by retaining only the first terms in the ex- 
pansion (2.9) and computing by the following formulae the first 
of the hollow determinate F from which the B'q (mentioned 
in section 2) are built up: 

If we write the expressions for F out explicitely we find that 
the first few even-rowed determinants of this type can be ex- 
pressed by means of the following two sets of magnitudes which 
themselves can be computed recurrently. We first define 


( 1 • 1 8 ) ^ ijkh — 2 '^'n^ If'rem 

rt</9 

where the summation runs through combination without repe- 
tition of the two affixes a, § selected amongst the four affixes 
(ijJch), means the correlation coefficient that has the two 
affixes which remain when a and p are taken away from the 
set {ijhh). Writing (1. 18) out in full for (1234) we get for instance 

(1. 19) »-t284 = »-i>-»-»l + >‘tS»'24+»‘28n4 + n4->'23+'’84'‘lS + >'84-nj. 

that is 

(1. 20) ri284 = 2 • (^12 • r84 -f- • ^24 d- ^23 • r^^. 


Similarly we define 
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(^’ ^ijkhbn~ ^ ’ ^rem-four 

rt</3 

where is the quantity r with the four affixes that remain 

when (a, fi) are taken away from the set (ijkhlm). Jn this way 
we may continue and define r with any even number of sub- 
scripts. 

A similai’ system may be built on the F's. Here we need 
however a more elaborate classification, namely one that in- 
dicates the various ways in which the affixes on F may be 
divided into subgroups. The number of affixes in each sub- 
group we shall indicate by superscripts. For the hollow deter- 
minants themselves we write 

( 1 . 22 ) r,j, Bic. 

Then wc define 

(1-23) rSy = I 

ft<(i 

(a, p) running as before through combinations without repeti- 
tion in the set [ijlch). Writing out Fy^H ii^ we get a for- 
mula similar to (1. 20), only with F instead of r. Further we 
define 

(1-24) 1 % n,.;-' 

« </5 

which is analogous to (1. 21). On the other hand 

/’(-’• 4) — y p(2) tt(4) 

\ *■ • -t ijkhhn — ^ ^ aB ^ rem 

«</? 

(^•26) = l etc. 

a<p<y 

arc now types of combinations not represented in the r’s. In 
(1. 25) we would of course have obtained the same expression 
by extending the summation to combinations without repetition 
of the 4 affixes [a^yiS) and letting the remaining affixes be only 
two in number. 

By means of tlie above symbols the first few evenrowed Fs 
may be expressed as follows 


(1.27) 



17 


(^- J^ijkh — rfjkh ~~ ( 2 ] 

(1. 29) Fijichim = + 2 ^^khfin ~ ~ (g, ^iikhlm] 

In order to express the un-even-rowed r’s we need the magni- 
tudes Fij^^hP etc. defined similarly to (1.25) and (1.26), and 
further these magnitudes defined for an incomplete summation, 
that is by letting the summation run over all affixes exept some 
specificcd affix p. This leads to defining 

(1-30) 

a </? 

where the summation runs over combinations without repetition 
of the two affixes picked amongst the set of four affixes 
obtained by leaving p out of the sai (ijkhl). Similarly we define 
the incomplete r 

(1.31) Tp , tjkhl ~ '^pn^rem-four 

where a runs through (tjkhl) exept p; p may be called the 
’’skew affix” in (1. 30) and (1. 31). As an example we may take 

(1. 32) rj. 12345 — ^12 ■ ^*1816 d" • r 1246 + ^14 • ri286 + T^r, ' ^*,234. 

In terms of the quantities (1. 30) and (1. 31) we have 

(1- 33) Fijk — 2'rij- ■ Vj^. 

(1 . 34) Fij^f^i — Fijic'fif + Fi\ j ^4 . ijkfii ■ ^jkhi- 

The formula (1.34) is developed with i as the ’’skew” affix, but 
we may just as well use any of the other affixes for this 
piupose. 

The above formulae have little interest when it is wanted to 
compute all possible scatterances in a set of reasonably large 
dimensionality. In that case one will use the tilling technique 
of Section 15. But the above explicit© F-formulae may serve 
to determine successive approximations to* a given scatterance 
of very high order. Indeed the successive terras B.^, etc. 

of the formula (2. 9) of the next Section may be locked upon as 
successive terms in a Taylor series for A* -^o ^8 equal to \,B^ 
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equal to zero, B., the sum of all two-rowed r in the big set for which 
tho scatterance is to bo computed, etc. By letting (ij) in (1. 27), 
(ijJc) in (1. 33) etc. run through all possible combinations in the 
big set we thus obtain tho first terms of the scatterance in 
question. 

As an example of the nature of the approximation obtained 
by these successive terms we take the following scatterance 
in the potato data of Section 31. 


TABLE ( 1 . 3 .^). POTATO DATA. SUCCESSIVE TERMS IN A12.45078 


First term 


Second term 

n. 

Third term 

B, 

Fourth term 

B, 

Fifth term 

B, 

Sixth term 

B, 

Seventh term 

B, 

Eight term 

B. 

Total A (in 

eight set) = 


= 1.000000 
= — 1.793000 
= 1.448480 

= — .392033 
= .010346 

= .010648 

= — .001132 
= — .000005 

_ _ ^^04 


How can the scatteranoes be interpreted as indicators of 
linear con fluency ? 

In the first place wo note that the general tendency of the 
A-s will be to be all the smaller, the better the linear connec- 
tion in tlie set considered. (For a geometric interpretation of 
the scatteranccs from this view-point see Section 2 in Part II 
of ’’Correlation and Scatter . . .”). But we are not only concern- 
ed with finding a set about which it can be said that its 
variates arc linearly connected. From the confluency view- 
point it is just as important to ascertain that the set consideml 
is simphf collinear, wdiich means that not all its first subsets 
are collinear. (The first subsets are the sets obtained by 
leaving out one of the variates at a time). Therefore some 
comparison must be made between the scatterance in the set 
considered and the scatteranccs in the various first subsets. 
These latter wo shall call the subscatter ances. 

The general tendency which we must first of all look for 
when we compare sub scatteranccs and the scatterance in the 
bigger set is whether there is a sharp decline when we pass 
from the former to the latter. We are particularly interested 
in seeing whether the scatterance in the bigger set is much 



10 


smaller than the smallest of the subscatterances. If this is not 
so, the new variate that is added as we pass to the bigger set 
cannot be looked upon as important. But on the other hand it 
does not do any great harm. It is rather a neutral variate. For 
the moment we shall not go into any more detailed discussion 
of the nature of the variate in this case. The complete analysis 
in this case cannot be given only on the basis of the scatter- 
ances, and must therefore be postponed till later (see in parti- 
cular Section 17). 

Only if there is a substantial drop in the scatterances do we 
have a situation where the passage to the bigger set may con- 
stitute a significant progress in tlie analysis. But hero we must 
be careful: there are two possibilities. A sharp drop in the 
observed scatterances will be produced, not only if there exists 
in the bigger setb one linear connection between the variates 
which are so much more perfectly fulfilled than a linear con- 
nection in the subsets, but a sharp decline may be produced 
also if all the subsets were already systematically connected. Indeed, 
in this case the erratic element will get a much smaller chance 
of keeping the scatterance in the bigger set up from zero. Tliis 
is plausible already intuitively, and can also be deduced from 
the theoretical considerations of Section 8 and the numerical 
examples in Part IV. Thus a sharp decline as we pass from 
the subscatterances to the scatterance in the bigger set may 
be either a warning signal that wo get into a multiply coll inear 
set where a regression equation has no meaning, or it may bo 
a criterion that we get a set where the regression equation is 
more exact than before. 

Incidentally this shows how absurd it is to use the multiple 
correlation coefficient in the way in which it is usually employ- 
ed. The multiple correlation coefficient is indeed essentially 
determined by the ratio between a scatterance and a sub- 
scatterance. (The reader may for instance compute the mul- 
tiple correlation coefficient of 4 on (123) in the meaningless set 
(1234) in Section 23. It turns out 0.99. The scatterances need- 
ed for this computation are to be found in the tilling tables of 
Section 23). 

Which one of the above two alternatives we are confronting 
depend on whether the deviations from zero of the subscatter- 
ances were systematic or accidental. Is there any feature in the 
distribution of the subscatterances that can reveal anything 
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about this? If we are to reckon with all conceivable possi- 
bilites, then virtually any observed distribution of the scat> 
terances can be interpreted just as well as due to erratic in- 
fluences as due to systematic ones. But practically speaking 
it seems probable that there is all the more chance for obtain- 
ing an even distribution of the subscatterances the more ex- 
clusively they are determined by random errors. Therefore, 
if one or more of the subscatterances deviate considerably from 
tlie minimum subscatteranoe, it seems plausible to conclude 
that we do not have multiple collinearity in the bigger set. But when 
all the subscatterances are more or less equal we must be pre- 
pared for such a possibility. Of course it may conceivably 
happen that the subscatterances turn out approximately equal 
even if their deviation from zero are essentially systematic, but 
so long as we base our confluency conclusions only on scatter- 
ances we have no means of recognising this case. If we want 
to play safely we ought therefore to refuse to pass on to the 
bigger set whenever the subscatterances are nearly equal and 
of such a size that it is not obvious that they are systemati- 
cally different from zero. The method of Part III will furnish 
a more refined criterion that permits a more definite conclusion 
even in the case which we must thus leave in suspense when 
we use only scatterances. 

To resume we may formulate the rule for the interpretation 
of scatterances as follows: 

I. Let n variates be observed. If it is contemplated to deter- 
mine a regression equation containing V of tliese variates, 
all possible v-dimensional sets which can be formed in 
the big w-dimensional set should be investigated and that 
one, nr those r-dimensional sets which have the smallest 
scatterance should be selected for a further scrutiny. 

II. In each such v-dimensional s^'t selected for further scru- 
tiny all the subscatterances should be considered. If the 
scatterance in the v-dimensional set is not appreciably 
smaller than the smallest subscatterance contained in it, 
there is neither any great harm nor any great use in con- 
sidering the v-dimensional set in question We could just 
as well be satisfied with that {v-l)-dimensional set which 
has the smallest subscatterance. 

III. If there is a sharp decline as we pass from the subscatter- 



21 


ances tx> the scatteranoe in the j^-dimensional set, two 
possibilities are present. Either this decline means that 
it is a significant progress to form a regression equation 
in this v-dimensional set, or it means that it is parti- 
cularly dangerous. If at least one of the subscatterances 
are great it is probably safe to pass on to the bigger set. 
Even if all the subscatterances are small it may be 
advisable to accept the bigger set, provided that there 
is a considerable spread in the subscatterances, for instan- 
ce if there is one subscatterance that is decidedly smaller 
than the other subscatterances or if there is at least one 
subscatterance that is definitely larger than the others. 

IV. But if all the subscatterances are about equal, and rather 
small, then the bigger set must bo refused even though 
its scatter ance is much less than the subscatterances. 
Conceivably it might even in this case have been connect 
to accept the bigger set, but the scatterances do not give 
a moans of finding out whether this is permitted or not. 

V. If by the above criteria there are more than one »^-dimcn 
sional set which it seems plausible to accept, one should 
proceed to scrutinising all the (j^+l)-dimensional sets ac- 
cording to the criteria (I) — (IV), omittimf however any 
set that contains a stibset tvhich by (IV) has already been 
recognised as dangerous. 

2. THE ELLIPSOID METHOD AND THE CHARACTERISTIC 
POLYNOMIAL. 

When I started the experimental work of trying to find other 
criteria that may replace or supplement the scatterances as 
tests of confluency, the first idea followed up was to study the 
characteristic roots of the correlation determinants. The 
characteristic roots . . . A„ of the determinant (1. 15) are 
defined as the zeros of the characteristic polynomial 


^1- 

A r,j . . . 


^21 




r„2 . . . 



‘ For instance, not very £ar from being of the same order of magnitude 
as that which would be expected in a scatterance for the same number of 
random variates. Some information about this order of magnitude may be 
obtained from the tables of Section 30. 
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The expansion of this polynomial is 

(2. 2) = , +(-) M„A’‘ 

where is the sum of all the (”) A;-rowed principal minors in 
( 1 . 15 ). By convention ^^= 1 . Obviously is nothing but the 
determinant ( 1 . 15 ) itself. Introducing 1 — X = the expansion 
takes on the form 

(2. 3) P{X) = eCC)=B„+-B„-uC + . . . + B„r 

where the B's are the corresponding sums of principal minorls 
in the hollow determinant. 

In the expansion (2. 3) the second highest power of ^is lacking 
because the one-rowed principal minors in the hollow deter- 
minant ( 1 . 17 ) consist only of zeros. 

The value of the characteristic polynomial for a given value 
of X (or of may be computed directly by inserting the value 
of yt (or f) in (2. 1) and evaluating the determinant as it stands. 
Or the computation may be made from (2. 2) or (2. 3) by first 
evaluating the coefficients A or B. The latter method is to be 
preferred when many ordinates are wanted. 

Since all the characteristic roots are real, there is in general 
no particular difficulty in determining them by one of the usual 
approximation methods. In a three dimensional set they are 
easily determined in explicit form from Q{^) since the second 
highest power of J is here lacking. 

The individual terms of the coefficients A — that is the 
scatterances themselves — are determined most easily by the 
tilling technique of Section 15. If it is wanted to determine the 
coefficients A in all subsets (a, /? . . . y) they may be built up from 
the scatterances by the recurrence formula 

(2- ‘^) ~ p . . . /)— S ^k(n, /? . . .)i{. . . /) 

Where («,/?... ;') is any v-dimensional set, and (i . . .y) 
coefficient of (— A)*^ in this set. The inverted parenthesis denotes 
’’exclusion of”. For instance for v = 4, ^ = 2 , ^^2(1234) — -^2(123) + 
+ ^2(12 0 + ^2(184)+ ^2(2.34)' coiivenicnt check on (2.4) we 

have the fact that the magnitude determined as the sum of 
the elements in the right member of (2. 4) shall be divisible by 
(?' - Jc). 
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The recurrent computation of the A’s by (2. 4) can be checked 
for each v-level by the formula 

The sum in the left member of this formula is simply the sum 
of all the Aj^ in alt possible i/-sets contained in the big set 
(12 . . . w). 

The formula (2. 5) is obtained by the following consideration. 
The sum in the left member of (2.6) obviously contain all 
possible ^-rowed contained in the big set, each such A;*rowed A 
being involved a certain number of times. In other words the 
left member of (2.5) must be equal to multiplied by a 

certain integer. This integer is determined simply by compar- 
ing the total number of terms in the left member of (2. 5) with 
the total number of terms in Ai^^^ 2 ...ny The former number 

is equal to the number of terras in eacli namely 

times the number of entering into the summation in 

the left member of (2.5), namely |^|. And the number of terras 

in Aj^(^ 2 ...n) js The integer in question is consequently 

equal to 

A similar formula obviously holds good for the or 

more generally for any set of magnitudes that are built up in 
a similar way as a sum of elements defined for each (a/? . . . y) 
combination. 

In practice the check will take the form that on each y-level, 
all the numbers in the left member of (2. 4) are computed. The 
^a(i 2 . . n) formed directly, and it is verified that (2. 5) holds 
good. 

From the A’s we may pass to the B’s or vice versa by the 
formula 


(2.6) 




A(n,A - y) 


and 
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k 

(2- V A{a, . P') = ^ ) ^h(a, ^...yy 

/-=0 

For lc — v[2.(S) gives in particular in the big set, i. e. for 
v=n 

( 2 . 8 ) = 

/t=0 

From (2. 7) we similarly get 
(2.9) 

h-i) 

As a final check on all the quantities A tlie ^’hollow” deter- 
minant 7i„ = r in the big set may be computed both by (2. 8) 
and by evaluating the determinant directly. The formulae (2. 4) 
to (2. 7) may also be utilized for various other checking pur- 
poses; for instance if the i^’s in the big set have been comput- 
ed, but not the A’s, and if one proceeds to computing the A’s, 
in all subsets, then on eacli j/-lovel (2. 5) may be used as a 
check by inserting in its right member the expression for the 
big set A’s in terms of the big set B's. 

When the coefficients A or B arc determined, the values of 
the polynomial B(l) or — which amounts to the same — of 
Q(Q are most easily computed by ordering the terms according 
to the principle of ’’Chinese boxes”, follows: 

(2. 10) 0(?) = 1 1 ?+ etc. 

If a computing machine is available, which has an arrange- 
ment for transferring mechanically the figure standing in the 
’’result”, to the key-board, the computations by (2.10) can be 
done very quickly. 

It is a classical fact that the characteristic roots are propor- 
tional to the square lengths of the main axes of the normal re- 
gression ellipsoids fitted to the observed scatter. This is easily 
seen as follows. The regression ellipsoids are defined by 
2/j r^j = C where the are the scatter diagram coordinates, 
r^j the elements of the reciprocal of the correlation matrix, and 
C a parameter which is constant along a given ellipsoid in the 
family. Let be the direction numbers for a given straight 

line trough origin. The equation of this line may be written 
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§i — KCi where is a factor of proportionality. Determining 
K so as to get intersection with the ellipsoid we find 
= C/'2ij r^j Ci Cj. The square distance from origine to this point 
is consequently — AG where 

( 2 . 11 ) X ~ Y.iCifYiji'ijCiCj. 

Since a main axis of the ellipsoid is defined as a straight line 
from origin in a direction that makes its length measured from 
origin to the ellipsoid a minimum (for the short axes) or a 
maximum (for the long axes), the problem is to seek the ex- 
tremum of X. By partial derivation in the usual way this leads to 

the equation lj{rij-ACij)cj = 0 for the Cj, where 
In order that this system shall have a solution (other than the 
trivial = 0) it is necessary and sufficient that X is a zero of the 
polynomial P(^) defined by (2. 1). Hence the characteristic roots 
of the matrix (rn) must be proportional to the square lengths of 
the main axes of the regression ellipsoids. 

It therefore seems plausible to conclude that the observed 
scatter is systematically x-fold collinear when k of the char- 
acteristic roots are ’’very small”. This method may for short- 
ness be called the ellipsoid method. 

The main axes of the regression ellipsoid also have — as is 
well known — an immediate connection with the reduction of 
the variates to an uncorrelated form. Indeed, it is always 
possible to find a homogeneous non-singular linear transforma- 
tion, 

(2.12) = 

such that the variates Zj^ become uncorrclated. The moments 
of the new variates will be 

Vij being the moments of the old variates, which is the same as 
their correlation coefficients, when the are normalised in 
such a way that their sumsquares are unity. The problem is 
thus only to reduce a quadratic form (namely the one built over 
Tij) to a sum of squares, and it is a classical fact that this is 
always possible. It is even possible to do it in an infinity of 
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ways (one method of carrying the computations through by a 
recurrence formula was indicated in Section 4 of my 1928 paper 
’’Correlation and Scatter . . But if we impose the supp- 
lementary condition that tlie transformation shall be orthogonal 
then it is uniquely determined (when the observed correlation 
matrix is nonsingular and the characteristic roots different). 
Indeed, the condition of non-correlation for the variates is 
expressed by 

Ij, being certain numbers that it is not necessary to specify for 
the moment. If we insert in the right member of (2. 14) the 
condition of orthogonality, namely 

(2.15) ^j^hj^kj — ^hk 

and notice that Cf,^ may be written (2. 14) takes the form 

( 2-1 lij {’'ij - h e,: j) eu % = 0, 

Looking upon this as a system of equations of the form 
Yi Cfii Ui — 0{h — 1,2 . . . n) where (c/,j) is a non-singular matrix, we 
see that we must have 

(2.17) 2j(r,_,.-Ai.ey)Ctj = 0 (t=l,2...»). 

This shows that the are nothing but the characteristic 
numbers of the correlation-matrix and the direction num- 
bers for the k-th main axis of the regression ellipsoid. In other 
words the transformed uncoiTelated variates z^ are nothing but 
the coordinates of the scatter points measured along the main 
axes of the distribution ellipsoid. 

By (2. 12) the variates z are expressed in terms of the 
Inversely the ^ are expressed in terms of the z by the formula 

(2.1«) 

This follows from the fact that tlie reciprocal of an orthogonal 
matrix is simply its transposed. It should be remembered that 
the coefficients in (2. 18) are by (2. 15) normalised in such a 
way that 



(2. 19) 


^ki — 1 
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If we accept the above mentioned criterion that the observed 
scatter is systematically x*fold collinear when x of the cha- 
racteristic roots are ’’very small”, then we would retain only 
the w— X terms in the rigid member of (2. 18) which correspond 
to the w — X characteristic roots judged to be significantly diff- 
erent from zero. For instance, if only one root namely is 
deemed significantly different from zero, we would put 

2- 20) • Cji 

where c^,- are tlie direction numbers determined from (2. 17) 
(for Jc — 1), and normalised according to (2. 19). If two roots 
and I 2 i^re significantly different from zero and significantly 
different from one another, we would put 

( 2 . 21 ) ■ eii + ^-2 - c,i 

etc. 


Theoretically this method seems very promising, but in 
practice I have found that, at least so for as the study of the 
unfolding capacity of the scatter in various kinds of economic 
data is concerned, it does not lead to any more conclusive re- 
sults than the scatterance method. In studying the set of cha- 
racteristic roots for a given correlation matrix we are indeed 
confronted with just the same kind of difficulty as in the study 
of scatterances. The big question also here is: Are the para- 
meters considered systematically different from zero, or are they 
pushed away from zero just by the disturbances of the data? 
The usual sampling errors do not answer this question. And 
in the ellipsoid method we have in addition the question of 
whether the roots considered are systematically different from 
each other. It will presently appear that this latter question 
may in practice be particularly troublesome. 

As an example of the difficulties inherent in the ellipsoid 
method I shall mention a study of an eight-variate problem 
which was undertaken jointly by Dr. Frederick V. Waugh, of 
the U. S. Department of Agriculture, and me during Dr. 
Waugh’s work at the Institute in Oslo and in my seminar in 
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1932. I discribed tMs work in my lectures at the Institut 
Henri Poincare in Paris in the Spring of 1933. 

The object of the study was to see how the price obtained 
was mfluenced by the quality of potatoes in a certain New 
England market. The variates studied were: 

1. Wholesale price. 

2. Percentage of potatoes having a size between 
1 3/4" - 2 1/4". 

3. Percentage of misshapen potatoes. 

4. Index of colour of the skin. 

5. Index of bruises. 

6. Cuts. 

7. Scab. 

8. Other features. 

In Elis problem the questions of confluency is of course of 
paramount importance. We may for instance ask: Will the 
variate No. 5: Bruises represent an independent contribution 
towards the determination of the price, or is its influence 
already taken account of through some of (or all of) the other 
variates? 

The correlation coefficients were as indicated in Table (2. 22). 
For the purpose of the subsequent experimental computation 
the correlations are here given with six decimal places, but 
only two or at the utmost throe places are statistically signi- 
ficant. 


TABLE (2. 22). POTATO DATA. GROSS CORRELATION COEFFICIENTS: 


r,J j=\ 2 4 .'i (> 7 8 


1 = 1 


l.OOOOOO -0.210802 -0.21.5275 -0.452059 -0.315211 -0.198522 -0.228210 -0.279094 


2 

3 

4 

5 

6 

7 

8 


1.000000 0.102185 0.014595 -0.054787 0.081614 0.134882 0.094089 

1.000000 0.264599 0.269739 0.214237 0.206190 0.309791 

1.000000 0.443629 0.163629 0.254723 0.303490 

I.CXIOOOO 0.072961 0.198672 0.299576 

1.000000 0.205694 0.407260 

1.000000 0.361637 

1.000000 


Instead of computing just the eight values representing 
the characteristic roots we studied the whole shape of the 
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polynominal P(A) defined by 
(2. 1). This gives a much clearer 
view of the situation. The curve 
P(A) may then be looked upon 
as a sort of spectrum for the 
matrix. (2. 23) summarises the 
values of F{X) which were com- 
puted during the work. 

Drawn on a reasonably large 
scale the curve P{1) appears 
as in Figure 1. 





TABLE (2. 23). VALUES OF THE CHARACTERISTIC POLYNOMIAL FOR 
THE POTATO DATA. 



m \ 

X 

m 

0.000 

0.28270786 

0.902 

- 0.00000014 

0.1 

0.090450.32 

1.0 

- 0.00000547 

0.2 

0.02276.385 

1.01 

0.00000690 

0.3 

0.00384628 

1.1 

0.00014828 

0.4 

0.00027373 

1.15 

0.00003985 

0.475 

0.00000271 

1.155 

0.00000629 

0.48 

O.OOOOOIOI 

1.16 

- 0.00003312 

0.484 

0.00000016 

1.17 j 

- 0.00013244 

0.485 

0.00000001 

1.2 

- 0.00063999 

0.486 

— 0.00000012 

1.3 

— 0.00692878 

0.49 

— 0.0(X)00045 

2.0 

— 2.632686.30 

0.5 

— 0.00000021 

2.3 

— 6.99993462 

0.502 

— 0-00000004 

2.6 1 

- 4.91675881 

0.503 

0.00000007 

2.65 

— 1.88389024 

0.55 

0.00000508 

2.673 

— 0.02001710 

0.57 

0.00000431 

2.675 

0.15796210 

0.59 

0.00000163 

2.7 

2.61512460 

0.6 

— 0.00000018 



0.61 

— 0.00000210 



0.7 

— 0.00000938 



0.75 

— 0.00000293 



0.76 

— 0.00000166 



0.77 

— 0.00000068 



0.78 

0.00000000 



0.79 

0.00000021 



0.798 

0.00000001 



0.8 

— 0.00000001 



0.9 

- 0.00002836 
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The zero at B is obviously quite significant. Its exact value 
is — 2.6732264, which gives the following direction numbers 
for the corresponding main axis 


Cjj = —0.398108 
= —0.129359 
c,3 = 0.345343 

Ch - —0.420703 
= 0.368630 

= —0.301462 
c^^ = 0.344614 

c,3 =:— 0.427782 


If we only take account of this first zero, the price 
(in normal coordinates) would be expressed as 

(2.25) ^^, = -0.4^, 

wliere is some sort of (negative) standard expression for 
’’quality”, so defined that a change in this standard implies a 
simultaneous change in all the individual quality indices listed 
above as variates Nos. 2 — 7. The intensity of the change in 
the individual indices with a change in is defined by the 
coefficients (2. 24). For instance, 1 unit increase in the 
(negative) quality index means 0.13 units decrease in the 
variate No. 2 and 0.35 units increase in the variate No. 3 etc. 

If we are not satisfied with this definition of the quality 
standard and want to study the possible independent influence 
on price exerted by some of the special quality-features, we 
must look for further characteristic roots. And here is where 
the weakness of the ellipsoid method will become apparent. 

From figure 1 it looks as if there is only one other zero than 
B, namely one located at A. From theory we know however 
that there ought to-be eight real zeros. The explanation is 
that the point A, which on the scale used in Figure 1 appears 
only as one zero, contains in reality seven zeros. If the 
vicinity around A is drawn on a larger scale we get the picture 
given in Figure 2. The values of the zero points are given in 

(2. 26) . 
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TABLE (2. 26). 
CHARACTERISTIC ZEROS. 

(In descending^ order of Magnitude.) 

I == 2.67322 
1.15579 
1.00443 
0.7983 
0.78 
0.5990 
0.5024 
0.4851 

In order to ^et an impression of how completely the zeros 
around A arc mixed — from the practical view-point — we 
may notice that if the complete curve is drawn on the scale 
used in Figure 2, the bottom which in Figure 1 is marked C 
would be about 4 kilometres away. 

The same thing may also be recognized as follows. — 
Roughly speaking we can say that if not more than three de- 
cimal places are significant in the values of the original cor- 
relation coefficients, anything beyond the third decimal place 
in the values of the characteristic polynomial is not significant. 
Of course, this rule is not quite exact, but is sufficient for our 
present purpose. Now, over the range from A — 0.48 to A = 1.2 
where all the first seven of the characteristic roots are located, 
the ordinate of the characteristic polynomial P(A) never 
reaches above + 0.00015 and never below — 0.00003. The situa- 
tion is exhibited in Figure 3. 
Over the interval in question 
the ordinate of tlie character- 
> istic polynomial never rea- 
ches out of the thickness of 
the heavy line in Figure 3 
while the dotted line indicates that level which the ordinate 
must have reached if its value should have been significant 
(and even this is interpreting the significance of the original 
correlation coefficients rather liberally). 

This means that in practice we cannot speak of certain de- 


PlN 



Fig. 3. 
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finite ’’zeros” of P{X) at all over the range from A = 0.48 to A= 1.2. 
Over this range the function P(A) has for all practical purposes 
simply a continuous contact with the A-axis. This being the 
situation, th^ determination of the main axes of the ellipsoid 
— other than the one given by (2. 24) — would be entirely 
meaningless. 

This indeterminateness of the main axes may also be studied 
by means of certain suhcharacteristic polynomials in the following 
way. The direction numbers for the main axes 

corresponding to the zero are determined by (2. 17). This 
moans that if we let 

(2.27) G,{X)...GnW 

he tlie polynomials of A that occur as the elements in any row 
of the adjoint of (2. 1) (or in any column, since (2. 1) is sym|- 
metric), then the n direction numbers corresponding to a given 
zero A* is obtained simply as the values assumed by the n 
polynomials (2.27) when A is put equal to A^.. 

Since (2. 1) is a singular matrix in each of the zeros Aj^, 
and hence its adjoint of rank not higher than 1, the absolute 
value of the direction numbers can also be determined by 
putting the squares of the direction numbers proportional to the 
polynomials 

(2.28) P,(X) 

P^(A) being the characteristic polynomial for the (w — 1) dimen- 
sional set obtained by leaving out the variate no • i, P,(A) may 
be called the subcharacteristic polynomials for the big matrix. 
Let 

(2. 29) 

be the elements of the adjoint of (2. 1), considered as functions 
of A. Since these functions are S 3 unmetric in {ij), and since the 
adjoint of a singular matrix is of rank not higher than 1, we 
have in any of the points A = Afc 

(2.30) P,{l,)-Pj(X,) = [fM. 

This shows that in any of the points A = all the subcharacter- 
istic polynomials defined by (2.28) must have the same sign. 
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In the point representing the smallest of the numbers all 
these polynomials must even bo non-negative (since for values 
of A smaller than this (2. 1) is positive definite). The polynomials 
Gi need not satisfy any such sign condition since they are not 
principal (but skew) minors of (2. 1). 

A high degree of indeterminateness will be attached to the 
main axes when the direction numbers determined either by 
Gi or by are small. By plotting, either the curves Cr^ or the 
curves P^- over the whole range where the zeros are located, 
we therefore get a good impression of the degree to which the 
main axes have any meaning. They will be significant only 
to the extent that the polynomials in question (G^ or PJ in 
the vicinity of the characteristic zeros reaches up above the 
ordinates representing one unit of the third decimal place. 
Figure 4 exhibits the polynomials G^ in the potato data, (t,(A) 
being here conventionally chosen as the elements in the last 
column of the adjoint of (2. 1), in other words as 

(2.31) G,{l]=^fM. 

The values of these po- 
lynomials in the charac- 
teristic points Xji are 
given in (2: 32) ; and the 
values of the polynomi- 
als Pi(A) in th same po- 
ints are given in (2. 23). 

Both these tables show 
very convincingly that 
it is only in the charac- 
teristic point A = 2.67 
that the main axes ha- 
ve any significance. 

This confirms the con- 
clusion that all the se- 
ven characteristic roots 
at A in Figure 1 must 
for all practical purpo- 
ses be interpreted as 
lying in a cluster, with; 
out any possibility of 

3 




Table (2. 32). VALUES OF POLYNOMIALS Gi{X) IN THE CHARACTERISTIC POINTS. 
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discriminating between them. Since it is out of the question to 
assume them all to be significantly different from zero, the 
conclusion of the ellipsoid analysis must be that it is only the 
root at B that can be looked upon as significantly different 
from zero. Consequently the price must by this method be 
looked upon as capable of being expressed in terms of one 
single gaality index as indicated in (2. 25). 

This result is however in contradiction both with the concrete 
knowledge we have of the data and with the impression one 
gets by a detailed cross classification on the individual observa- 
tions. The general knowledge of the data as well as the cross 
classification indicates that there is more than one significant 
degree of freedom. Some further support for this conclusion 
will also be found by the analysis of Section 31 based on the 
’’bunch” technique. The bunch analysis will also give some 
indication of which one of the observational variates are the 
most important to include. 

In view of this negative conclusion regarding the usefulness 
of the ellipsoid method I decided to postpone publication of my 
work on the ellipsoid method until I had a better method to 
offer by which the ellipsoid method could be compared ^ 

3. MINIMAL X-ROOTS AND MAXIMAL f-ROOTS. 

If the characteristic roots are to be used at all as criteria of 
confluency, it seems better to compute only one of them, namely 
the minimal root, but then do this for all possible subsets. By 
doing this one dodges at least that difficulty which consists in 
judging whether the roots are significantly different from one 
another. The computation of the minimal roots gives an alter- 
native set of parameters to study instead of the scatterances, 
each scatterance is simply replaced by its minimal root. For 
the systematic computation of these roots in all possible subsets 
I have worked out the following technique which has been 
found convenient and has been used in all the numerical work 
done along this line at the Oslo Institute. 

^ On other types of data the ellipsoid method may penhaps be used with 
advantage. Professor Harold Hotelling in his highly interesting paper > Ana- 
lysis of a Complex of Statistical Variables into Principal Components*, The 
Journal of Educational Psychology, 1933, has applied it to psychological data. 
In this paper he also gives a method of successive approximation to the 
characteristic roots which will work well, it seems, whenever the roots are 
significantly distinct. I wonder wthether his method will work equally well 
when the roots lie close together as in the above eight variate problem. 
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Instead of considering the minimal x-roots, it is for tha 
practical computations more convenient to consider the corres- 
ponding maximal ^-roots. We shall therefore consider the 
polynomial defined by (2.3) instead of the polynomial 

P{k) defined by (2. 1). 

Let ^ and be the maximal roots of and respec- 
tively, Qi{Q being the subcharacteristic polynomial obtained by 
leaving out the variate No • t. 

The shape of the functions ©(Q and Qi(Q in the first range 
to the left of unity must be as indicated by the curves A and 
B respectively in Figure 5. (The 

curve A in Figure 5 is construe- « 

ted by the formula = 0.25 > 

■f 0.37 r— 1.53 -f the curve / 

B by the formula Qi (J) = — 0.15 / 

— 0.59 f-f- J^rThe numerical con- 

slants here refer to an actual e* 

example). We are in particular 

interested in the relative loca- 

tion of the zeros. ^ ^ 

In order to study this we first 

notice that (2. 1) is a positive g -o* 

definite determinant for ? = 1 Fig. 5 . 

and all the minors continuous 

in therefore as ^ decreases from 1, §(?)? when written as de- 
terminant, must, to begin with, remain positive definite. As we 
move from ? = 1 towards the left, none of the polynomials can 
therefore vanish before Q. In other words: The maximal roots in a set 
of variates is never less than the maximal root in any of its subsets. 

As we move towards the left, there will come a point where 
for the first time one of the polynomials Qi vanish. Let ? = S® 
be the point where this happens, pifferentiating the determi- 
nant (2. 1) we get 




This shows that, up to the point Q must certainly be 
monotonically decreasing with decreasing Consequently: 
If ^ is the maximal root in a certain set, and the maximal 
root in that one of its subsets which has the largest maximal 
root, we have 
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(3.2) 

Furthermore: In the range (3. 2) the polynomial QCQ has exactly one 
zero. This gives a very convenient method of successively 
approximating to the maximal roots in all possible subsets of a 
given big set. First the maximal roots in the three dimen- 
sional sets are computed directly by the usual formula for the 
solution of a third degree equation. This is easy since the 
second power of f is lacking in the equation. For higher sets 
one then proceeds as follows. Since Q is non-negative in 
the point f = 1 and non- positive in ^ and has exactly one 

zero in this range, we can by linear interpolation take 


(3.3) 


?• = ?>- 


1 

«(?“)- e(i) 




as a first approximation, to the minimal root in the bigger set. 
Of course Q(l) is nothing but A second approximation 
will be 


(3.4) 

etc. 


r=?'- 




m 


The computations are ordered in a difference scheme as in- 
dicated in (3. 5). 


Table (3. 5). DIFFERENCE SCHEME FOR THE COMPUTATION OF 
MAXIMAL ROOTS. 


c 

Q 

Difference 
of C 

Difference 
of Q 

Divided dif- 
ference of ^ 
with respect 
to Q 

1.000 

A 

Q&) 

1 

1.000 

eCT-A 

^-1.000 

CCT-A 


m 


Qin-m 



Q(n 

r-g 

Q(n~m 

etc. 


The computation of the values needed of the polynomial 
C(?) in Table (3. 5) are done most conveniently by the formula 
( 2 . 10 ). , ; ! 

The scatterances and minimal roots have been computed at 
the Institute for a large series of data, primarily American con- 
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sumption statistics, which were collected and brought in shape 
for this analysis by Dr. Waugh. 

If the scatterances and minimal roots shall have any use as 
tests of linear confluency, the conclusions reached by using 
these two different sets of criteria ought to be essentially the 
same. Although no exact correspondence was found, yet ini 
most cases, the discrepancy was not very great. In general 
there was a definite tendency 
for the maximal ^ root to be 
large when the scatterance 
was small and vice versa. The 
kind of information yielded by 
these two sets of parameters 
may therefore be looked upon 
as rouglily the same. 

The graph in Figure 6 in- 
dicate the connection between 
scatterances and maximal 
roots in a set of the above 
mentioned American consump- 
tion data. (Butter 1919—31). 

4. LEMMAS ON CERTAIN PROPERTIES OF DETERMINANTS. THE 
HEAD COEFFICIENTS. 

In the following analysis we shall have to make use of a 
few properties of determinants which it will be convenient to 
indicate herei. 

(2. 1) with the expansion (2. 2) is the characteristic polynomial 
for the matrix r^. The adjoint of , namely the matrix 

( 4 . 1 ) <>, 

also has a characteristic polynomial. Let it be P(A), and let A be 
its coefficients. Then it is a classical fact in matrix algebra 
(contained as a special case in Frobenius theorem) that 

‘ A more complete summary of the classical facts in quadratic forms and 
matrix algebra that are of particular interest from the wiew point of applica- 
tion to economics and statistics, wore given in my Colloquium Lectures at the 
meeting of the Econometric Society in Leyden 1933. A mimeographed account 
of these lecture® based on my own notes and of notes taken by Mr. M. H. Belz 
during the lectures, is available and may bo ordered through the Economic 
Institute, Oslo. (Price Kr. 4.00.) 


CaNSOnPTION D»T» U. fc * 


Fig. 6. 
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(..21 

In other words we have 

(4.3) 

As an example of (4. 2) we may take the case w = 2. Here 
the characteristic polynomial is 

P(A)= =Ax2-(»-U + ^22)i + P. 

^21 ^22 ^ 

And the adjoint characteristic polynomial is 

-PW= = Ai 2 {^11 +r22)X + 

'21 '11 ^ 

In other words for n =2, the characteristic polynomial and the 
adjoint characteristic polynomial coincide, which cheks with 
(4.2). 

In the case n = S the adjoint characteristic polynomial is 

[(^22^33 ^32^23) (^12’'33 ^32^18) (^12^23 ^22^13) 

P{1)= {{nirs3—rsin»)—^] — ('•u>'2S— '•nAs) 

(^21^32 ^31^*22) (^11^32 ^31^12) [(^11^22 ^21^12) 

Writing this determinant out and collecting the terms we find 
that the expression reduces to 

P(A) = A\23 — A(rii + r22 + rss)X + (Ajg + + Aag)^*— A® 

which also checks with (4. 2),. 

For various purposes the three first coefficients of the 
characteristic polynomial are of special interest; we shall call 
them the head coefficients and designate them by special letters 

=A 

(4.4) = 


Thus A is the determinant (1. 15) itself, (P the sum of all its 
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n{n—\) rowed principal minors, and W the sum of all its (”) (n—2) 
rowed principal minors. In a given subset [a, y) we may 
use the notation ^ . .y And in the set obtained from 

a given set by leaving out, say the subscript, a, we may use 
the notation A.)„( etc. 

This being so, let us consider any set of v affixes (a, ^ ... y) 
and let us form the adjoint correlation matrix ^ ,.,y)^ the 
adjimction being taken within the set (a, . y). For shortness 

let us denote this v- rowed matrix 

/? • . . y) 

Consider the characteristic polynomial for the i/-rowed matrix 
s, let and be its first two head coefficients. Further, 
let A)p( and be the head coefficients in the v — 1 rowed 
matrix obtained from s by leaving out the row No. p and the 
column No. p. We shall study the difference A)p\ — A^*^ 

To evaluate this expression we need Sylvester’s formula on 
the minors in an adjoint. This formula may be written 

(4.6) 1^1 = * l^r ^ 

m m 

e being the signfactor + + ‘ + « + |fl| is a 

given w rowed determinant, the expression in the left member of 
(4. 6) stands for the m- rowed determinant consisting of the rows 
Nos. h,k...l and the columns Nos. u,v . . .w from the adjoint 
of a. The first factor in the right member indicates the n—m 
rowed determinant which is obtained when the rows u^v . . .w 
and the columns h,Jc . . .1 are omitted from | a | itself; the second 
factor in the right member is simply the (m~l)th power of the 
n-rowed determinant | o |. 

This being so consider A)J(. It may be looked upon as 
obtained in the following way. We take as a starting point the 
»/-rowed matrix of the original correlation coefficients in the 
set (a, /S . . . y). Of this matrix we take the adjoint, the adjunction 
being made within the set (a,§... y). In this v-rowed adjoint we 
consider the {v — 1)- rowed minor obtained by leaving out the 
row and the column p. 

By Sylvester’s formula this minor is equal to A*'-^ Further, 
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by (4. 3) we have = A*' ^ Sx ^xx running through a, /? . . . y). 
On the other hand we have =Zx | ^ |)px • px(i which by Sylvester’s 

formula is equal to Finally by Sylvesters’s 

^xp ^xx 

formula A*** = A’'~'i so that 


(4.7) 


©<'> A{'\ - A ©')?( = A "" "KpSxrxx - 2x(Vxx - >^x)l = 
A--'-^Sx»ix. 


Now let us take the adjoint of the p-rowed matrix .9^^. Apart from 
a factor of proportionality this brings us back again to the 
original matrix r^] more precisely we have 

(4.8) sij = rtj- A”-^. 

Inserting this in the right member of (4. 7) this member tiikes 
on the form Zx^px- The formula thus obtained holds good for any 
matrix s, it consequently also holds good for the original matrix 
r, hence 

(4. 9) '^x^lx = ® A)p( ~ A 

where the head coefficients A and (h are now taken in the set 
a,p...y of the original correlation matrix; x in (4. 9) runs 
through a, . y. 

As an example consider a three-rowed correlation matrix. 
The elements in the first row of the adjoint are here 

^11 = 1 ~ ^12 “ (^12 ~ ^18 ^2 s )> ^13 = (^12 ^28 “ ^ is ) 

The sum of the square of these three quantities is 

1 — 2r23 + ^23 + ^12 2ri2 ^18 r28 4- r \^ 4- ^2 ~ ^^12 ^13 ^28 4" ^3 

On the other hand the right member of (4.9) is in this case 

(1- rjj + 1 - rj, + 1 -ry . (1 -ry -2[l + 2»-i2ri5rj5-(r?2 + j^,+»^,)l. 

It is easily seen that these two expressions are equal. 

By taking the sum of (4.9) overj) and noticing that 


(4. 10) 


f2,A„( = © 

U,<IV( = 2-P 
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we get 

(4.11) l,jfij = <li^-2<PA 

In the summation in (4. 11) ^ and j run independently of each 
other through all the subscripts in the set in which 

the adjunction f is taken; and the head coefficients in the right 
member refer to the same set (a, . ./). (4. 11) may be used 
as a convenient check on the computation of the sumsquares 
in the individual rows computed according to (4. 9). 

6. REGRESSION SPREADS. 

The minimising of the sumsquares of the deviation from the 
regression plane may be done in different directions, for in- 
stance in the direction of the axis, in that of the x.^ axis, etc. 
These regressions we shall call the elementary regressions. The 
coefficients of these regressions (when the variates are taken 
in the normalised form) are nothing but the elements of the 
adjoint correlation matrix. The elements in the first row of 
this matrix are the coefficients of the first elementary regres- 
sion, those of the second row the coefficients of the second 
elementary regression, etc. 

The fact that these various sets of coefficients are nearly 
proportional (in other words tliat all the elementary regression 
planes nearly coincide) one would intuitively take as a sig;n 
that the regression plane determined in this set of variates is 
significant. I shall later discuss this idea more closely; for the 
moment let us adopt it heuristically. 

A first condition that would have to be fulfilled in order that 
these n sets of coefficients are proportional is obviously that 
these sets, when considered as defining n points in an w-di- 
mensional scatter diagram, lie in a plane through origin of this 
space. The idea therefore suggests itself to measure the degree 
of conformity between the n elementary regression planes by 
computing the scatterance of the regression coefficients con- 
sidered as statistical observations. These ’’observations” should 
however not be reduced to their mean values before computing 
the scatterance, since the ’’regression” plane in these ’’observa- 
tions” ought to go through origin. 

This leads to computing the expression 
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JRii . . . 

_ . . . Rjin 

B,, A, . . . 

(^•2) Bij — ^k'^ik^kj- 

Tho accent on the magnitudes B in the above formula may 
be interpreted as tme adjunction symbols. Indeed, by virtue of 
the formula for symbolic matrix multiplication we may define 
(Bij) as the square of the matrix in other words 

(5* 3) Bi^ — 

B^ is then the adjoint of B^. 

Similarly we may compute the corresponding coefficient (5. 1) 
for any subset (a, . y) in the big set 12 ... w. In this latter 

case the adjunction sign will of course have to be interpreted 
as adjunction within the subset (a, . y). 

By the formulae of the preceding Section the expression (5. 1) 
may be considerably reduced. As a matter of fact it may be 
expressed very simply by the head-coefficients of the character- 
istic polynomial. We first notice that the numerator of (5. 1), 
namely the determinant |.R| by Sylvester’s formula is 
but the matrix B was the symbolic square of r, so that 

(5.4) |E|= 


(5. 1) 
where 


Further, each factor in the denominator of (5. 1) is by (4. 9) 
equal to 

(5. 5) iipp = <Z> • A)p( — A ■ 

The expression (5. 4) shows that, in order to get a coefficient 
comparable to the scatterance, we ought to take the 2(n - \)th 
root of (5. 1). We thus finally get 


(5.6) 


A 

2Cn- 1). I~ 

V -^11-^22 • • • -^nn 


where the Bpp are given by (5.5). 

The coefficient defined by (6. 6) may be called the regression 
spread. It may be looked upon as a sort of corrected scatter- 
ance, the denominator of (5. 6) being the correction factor. 
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An important property of the coefficient (5. 6) is that it is 
capable of increasing if more variates are included. In this 
respect it differs from the scatterance. 

If the regression spread increases as a new variate is includ- 
ed, it may be taken as a warning signal that the inclusion of 
the new variate is not warranted. This criterion is not, 
however, final. The behavior of the regression spreads must 
be scrutinized by principles similar to those used in the study 
of the scatterances (see Section 1). 

6. LINE COEFFICIENTS. 

The regression spread defined in Section 5 gives an expres- 
sion for the closeness with which the adjoint of a given cor- 
relation matrix comes to being singular, but if we are looking 
for a unique regression plane it would in fact be more plausible 
to construct a criterion for the adjoint correlation matrix being 
of rank not higher than 1 (i. e. all the rows proportional). 

If the adjoint correlation matrix — so far as the systematic 
variations are concerned — is of rank 1, all the two rowed de- 
terminants ^ii ought to be small. Therefore, if the 

average values of these determinants taken over all the subsets 
(ij) are small as compared with the average value of their 
leading terms fafjj, we ^ may take it as an indication that the 
matrix is close to being of rank not higher than 1. The 
averages here considered only consist of non-negative elements 
(since (r^j) is positive definite), so that there is no danger of 
the average becoming small because positive and negative 
terms cancel out. The sum of the two-rowed determinants con- 
sidered is nothing but A.) = An_ 2 ^n — ^ And the sum of 
the leading terms is 

^ ^jj = i ^ii ^x] 

i<i 

which reduces to 


i A)i(] 

Hence the smallness of the expression 


(6 1 ) 


2 y A 
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may be taken as a criterion for the adjoint correlation matrix 
being of rank 1. The expression (6. 1) may be called the line 
coefficient, because it expressed the closeness with which the 
normals of the various elementary regression planes come to 
lying in a common line. The components of these normals are 
of course nothing but the elements in the various rows of the 
adjoint correlation matrix. 

By (4. 11) the expression for the line coefficient may also be 
written 


( 6 . 2 ) 


0^-1, fi,' 


Since W and A are non-negative and the denominator of (6. 1) was 
originally written as a double sum of the produkt • fjj, which 
is non-negative, it is seen that (6. 1) is larger than 0. On the 
other hand (6. 2) shows that it must be not larger than unity 
since the term that is subtracted in the numerator is not less 
than the one that is subtracted in the denominator. In other 
words, the coefficient considered must lie between zero and 
xinity. 

If it is wanted to compute the line coefficients in all possible 
subsets, the work can most conveniently be arranged as 
follows. The A in all subsets are supposed to be computed (for 
instance by the tilling technique of Section 15). The second 
head coefficient O is then computed in all subsets by the re- 
currence formula 


(^* . . . J' — ^ . . . )i( . . . r 

i = ap .. .y 

being a given i^-set. This formula is only a special 
case of (2. 4). For each v-level the sum of all the Of) thus computed, 
are checked by 


(6.4) 


«</?<... <y 






7i(12 . . . 71 )- 


To apply this check, either the A' 8 or the B's in the total set 
must be computed directly; this is a simple matter when the 
A’s are already listed in a systematic way, for instance if they 
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are computed by the tilling technique as decribed in Section 
15, they can then quickly be copied on a listing adding 
machine directly from the tilling tables, (6. 4) is only a special 
case of (2. 5), the last member in (6. 4) is obtained by using (2. 7). 

The recurrence computations (6. 3) are most easily arranged 
by keeping the A’s written in tables with uniform spacing and 
in the exact concentric order defined in Section 15. If a list- 
ing adding machine is used for the determination of the A’s in 
the big set, such a lists of the A are already available. On these 
lists no indication of which scatterance the various figures refer 
to should be given, but one should rely on the uniform spacing 
and the exact order defined by the concentric numbering. In 
order to pick out the correct figures to go into the recurrent 
computation (6.3) we use ’’combination strips”, that is strips 
of paper or cardboard with chock marks in positions that in- 
dicate the figures to bo used for any given subset {a, ^ . y). A 
set of such combination strips is very useful also for other re- 
currence computation of a similar kind. (See for instance (6. 5) 
etc. below). We have found it both safer and quicker to rely 
on the uniform spacing and the combination strips than to write 
down on the lists of the A’s the subset to which each A belong. 

When the O in all subsets are computed and checked, the 
quantities 2^ are computed recurrently by 

and checked by 


( 6 . 6 ) 


y 9u/ — 2 /” — 

^ . . .r ^ 9 I ^y-2{12 . . . n) “ 

\ ^ I 

_ 2 2ry"/ n—h 


This check is obtained by putting Jc = v~2 in (2.5). The 
check on ip is, of course, done for each v-level in the same way 
as for (D. 

Further the squares A^ and are computed in all subsets, 
and the sumsquares 

^ap...y — ^ . .y 

i = a/3...y 

are computed and checked in each subset. This is done in 
precisely the same way as the <Z) were built up from the A 
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[compare (6.3) with (6. 7)]. The checking formula ’rntten out 
will now be 

(^•^) ^ — *'+l)S _ i).ro\veri • 

a<^... <y 

The sum in the left member of (6. 8) denotes the sum of all the 
I)'s computed on the level v by means of (6. 7), and the sum in 

the right member is the sum of all the squares of the 

{v — l)-rowed A’s contained in the total set. This latter sum must 
be computed directly in order to apply the check. All the work 
(6. 5) and (6. 7) is doile by the combination strips. 

By means of the above quantities the line coefficients in all 
sets are easily computed by (6. 1). The difference (Z>“ — D which 
is in the denominator of (6. 1), may of course be verified by a 
simple checksum utilizing the sum of and the sum of D. 

The practical use of the lino coefficients is similar to that of 
the regression spread and the scatterances. 

In a sense the use of the line coefficients and the regression 
spreads is a little safer than that of the scatterances because 
they will actually increase when we get into multiple collinear 
sets, while the scatterances will always decrease. On the other 
hand the situation may be such that we must accept a set of 
variates even though it shows some increase in the regression 
spread or lino coefficient. Whether or not a slight increase in 
any of these parameters should be accepted cannot be definitely 
decided unless by the more elaborate method of Part III. We 
shall later discuss numerical examples which will illustrate 
both the advantages and the limitations of the line coefficients. 

PART II. SCATTER FUNCTIONS AND »TRUE» REGRESSIONS. 

7. INSIDE AND OUTSIDE INFLUENCES. SYSTEMATIC COMPONENTS 
AND DISTURBANCES. THE NOTION OF DISTURBING INTENSITY. 

Although the various empirical test-parameters considered, 
scatterances, characteristic roots, beam coefficients, etc. throw 
wme light on the question of confluency, they do not in all 
cases furnish a conclusive criterion. 

To advance any further in the matter it seems that we need 
a more systematic analysis based on certain definite assump- 
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tions about the nature of the variates. The present Part II is 
concerned with this type of analysis. It leads up to a certain 
statistical technique : the bunch analysis which will be discussed 
in the two following parts. 

Let us assume that each of the observed variates . . . . 
depends on certain other ’’basic” variates which are 

not observed, but which have actually been present and have 
determined each of the observed results. If the connection is 
linear, the dependency can be written in the form 

(^* 1 ) — ^kVikVk 


where the are constants. The n rowed and M columned 
matrix 


(7.2) 


I1P..1I = 


Pll- 

• VlM 

Pnl- 

• • VnM 


characterises the way in which the observational variates are 
built up from the basic variates. 

For .the moment we do not specify any further the concrete 
nature of the causal relations between the ys and the x's, we 
just take (7. 1) as a conceptual pattern which may serve as a 
starting point for the analysis. 

In many cases it seems plausible to formulate the conception 
of the basic variates in such a way that they become uncor- 
related. In this case any amount of correlation that has been 
observed between the xs will be due simply to the fact that 
one or more of the ys occur in more than one x. However, 
in practice — for instance if one actually tries to produce a 
numerical example illustrating a connection of the form (7. 1) 
— it is not plausible to assume exact nomcorrelation between 
the basic variates. In many cases it will be necessary to work 
with the more or less vague assumption that the correlation 
between these variates shall be ’’small”^ or that it shall be 
determined only by ’’random” fluctuations. 

We assume the xs as well as the y s to be measured from 
their means. Furthermore, it does not restrict generally if we 
assume all the basic variates to have unit sumsquare. This only 
involves a corresponding interpretation of the coefficients 
In other words we assume that 
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(7.3) b>] = l (iC=l,2...n) 


where [ ] denotes a summation over all the observations. When 
the assumption (7.3) is made, the coefficients express the 
relative importance of the various basic variates in the deter- 
mination of a given observed variate. 

If (7. 3) is fulfilled, the cross moment of the variates yj^ and 
is the same as the correlation coefficient between them, 
namely 


(7.4) 


^IIK “ 


[.Vh.Va-I 


The crossmoment between the observed variates .r, and xj 
will be 

(7- b ~ [-rj — 'Zinc '''' me Pm Pjic 

H and K running independently of each other through all the 
numbers’ l,2...il/. In other words the cross moment is 
nothing but the value assumed by the bilinear form built over 
the matrix Sjn^, when tlie variates in this form arc put equal to 
the i — tli and j - th row in (7.2) respectively. 

If the basic variates are assumed exactly noncorrclated, (7. 5) 
reduces to 

(7.6) n>/j = ^K-l>iKPjK- 

If we further assume that the units of measurement are 
chosen so that the observational variates satisfy a relation 
analogous to (7. 3), (7. 6) will at the same time be the correlation 
coefficient between the observational variates and Xj. If we 
finally assume that ilf < w, we get the special hypothesis regard- 
ing the correlation coefficients on which Spearman’s two factor 
theory and Thurstones multifactor theory are built. Here we 
shall not follow up this line of approach. 

In general we shall assume that M may be a number larger 
than n. A great number of different cases must then be en- 
visaged according to how an interconnection between the x'a 
may be produced through the 

Consider first two observational variates and x^. The 
basic variates y may then be classified in three groups accord- 



50 


ing as they occur in both variates and in only one of 
them, or in none of them. Those basic variates that occur in 
both observational variates and x (with a coefficient p dif- 
ferent from zero) will be said to constitute an inside or systematic 
influence on the observed interconnection between Xi and Xj, 
while those basic variates that occur in one of the two obser- 
vational variate, but not in the other, will be said to constitute 
an outside influence, or an accidental disturbance on the varia- 
tion of the observational variate in question. Those basic 
variates that do not occur in either of the two observational 
variates are of no interest in this connection. 

If tlicre is at least one basic variate that thus occurs in both 
the observational variates considered, we shall say that there 
exists an inside or systematic connection between these two 
observational variates, otherwise they will be said to be 
systematically unconnected. The connection here considered 
may be called inside or systematic in the restricted sense to 
distinguish from another sort of inside influence to be consider- 
ed presently. 

Next, consider a sot of v observational variates, say 
(7. 7) Nos. a, [j . . . y. 

A given basic variate may now be classified in one of three 
groups according as it occurs in more than one of the observa- 
tional variates in this set, or in just one, or in none. In the 
first case the variate in question will be said to exert an in- 
fluence that is inside the set (a, . . . y) or is systematic with 

respect to this set; in the second case the basic variate will as 
before be said to repi’esent a disturbance. The third case has 
no interest in connection with the set («, /? . . . 7). 

Thus, as wo enlarge the set of observational variates, the 
larger will be the portion of their determining factors that must 
be considered as ’’inside” or ’’systematic”, and the smaller will 
be tlio part that is still left in the category of ’’disturbances”. 
Furthermore, if we include in the statistical analysis some new 
observational variate in order to explain some of the lack of 
fit which we had in the original set, we must not forget that in 
practice this always means that we introduce a new complex, 
of which perhaps only a small part is systematically connected 
with those observational variates we originally considered. It 
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is just this fact that creates ttie real problems of confluenice 
analysis. 

There are many reasons which prevent us from ever reaching 
a situation where all possible fluctuations are exactly talcen 
account of by those variates we have included in the analysis. 
A smaller or larger part of the fluctuations must always be left 
in the indiscriminated category of ’’disturbances”. In each 
particular problem nature has drawn a more or less distinct 
line of demarcation between those fluctuations which we may 
hope to explain by studying simultaneously certain specified 
observational variates and the inexplicable rest. This justifies 
speaking of ’’systematic variation” and ’’random disturbances” 
in a semi-absolute sense, the set of variates within which the 
expressions shall be understood being defined in a more or less 
precise way by the very nature of the problem considered. As 
a rule when we use the terms ’’systematic variates” and ’’di- 
sturbances” without further specification, we take them in this 
semi-absolute sense. 

If in some way or another a distinction has been drawn 
between those basic variates which create the inside and those 
that create the outside influence, we may write each obser- 
vational variate as the sum of two component.s 

(7.8) = + 

x'l being the systematic part of — that is the i)art that is 
systematically connected with the other variates considered — 
and x] its ’’disturbance”. 

Let 2 ... A be numbering of the observations, and let, 

as before, the Gaussian symbol [ ] denote a summation over all 
the observations.. Since we assume all the variates to be 
measured from their means, we have [.rj = [.t-] = [x[\ — 0. 

The observed cross moment m^j~[XlXJ] is e(]ual to 

(7.9) m,.j = m'lj + x]] + [*; Xj] + \x] x ] ] 

where m-j is the cross moment of the systematic parts of the 
variates i and j. 

For i — j (7.9) reduces to 


(7. 10) 
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The decomposition (7. 8) and the ensuing formulae (7. 9) and 
(7.10) have here been derived from the notion of basic variates 
and the formula (7. 1). But there is obviously nothing that pre* 
vents us from taking the decomposition (7.8) independently 
as the starting point of the analysis. 

There are three assumptions which may suggest themselves 
regarding the nature of the correlation between the variates in- 
volved in (7. 9) and (7. 10). 

I. There is no correlation between the accidental parts of 
two different observational variates. 

II. There is no coiTclation between tlie accidental part of one 
observational variate and the systematic part of anotlier. 

III. There is no correlation between the accidental and sys- 
tematic parts of a given observational variate. 

If tlie assumptions (I) and (II) are made we get 
(7. 11) for 

and if the assumption (III) is made we get 
(7. 12) 
where 


If may be called the disturbing intensity of the variate x^\ it is 
the ratio between the suinsquare of the disturbance of and 
the sumsquare of the observed x^ itself. 

The two formulae (7. 11) and (7. 12) can be expressed in the 
single formula 

(7.14) = 

By the definition (7. 13) must be a non-negative quantity, 
that is 

(7.15) 0<A,.. 

The observed variance must therefore be larger than the 
systematic variance if assumption (III) is satisfied. The same 
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would however be true even on a less rigorous assumption. We 
only need to assume: 

(III’) There is not such a higher inverse correlation between 
the accidental part of and itself that the third term in 
(7. 10) outweighs the second. 

If this is the case, we still have the equation (7. 12) and the 
inequality (7. 15), the only difference is that will now be 
defined by 



In the following we shall as a rule assume (III’) instead of 
(III). 

Assumptions of the kind we have expressed in (I) — (III) 
must of course be interpreted cum grano salts. In practice we 
must reckon witli the possibility that they are not exactly, but 
only approximately fulfilled. 

It may be noted that the assumption (III) does not exclude 
the possibility of the si^e of the disturbance changing system- 
atically with the size of the variate itself. Ill only involves 
that there is such a mixture of positive and negative distur- 
bances that in the total result no correlation appears between 
Xi and its disturbance. 

Let us normalise the variates by dividing each variate by 
its observed sumsquare. The systematic and accidental parts 
of each variate must consequently also be conceived of as 
divided by this magnitude. The moments of the systematic 
variates thus normalised will be r^j(l — which may also be 
written 

(7.17) — A, = moments of systematic parts of 

empirically normalised variates. 

In other words the ’’true” moment matrix of the normalised 
variates will be 



where - disturbing intensities defined by (7. 16). 



64 


Since the true correlation coefficients are obtained by- 

normalising the true moment matrix, we can also write 

(7.19) ry=f(l3i;)'(l-A7) (i+i) 

wliere are the true correlation coefficients and r^j the 

observed. This equation follows of course also directly from 

(7. 11) and (7. 12). If we insert the expression (7. 19) for 
in (7. 17) or (7. 18), we see that tiie ’’true” moments of the 
(empirically) normalised variates may also be written 

(7.20) pyV/(r-x,.)(i^j) 

Hero the true moments of the (empirically) normalised variates 
are expressed exclusively in terms of the true correlations and 
the disturbances. 

8. TIIE SCATTER FUNCTIONS AND THEIR TAYLOR EXPAN- 
SION. THE CUSHION EFFECT. 

The discussion of the last Section suggest that the function of 
. . . Xn obtained by taking the determinant of (7. 18), and also 
the similar functions for any given subset [ij . . . k) will play an 
important role in any analysis of confluency of regression equa- 
tions. These functions we shall call the scatter functions. They 
will be denoted F(X^. . . AJ for the big set and j^(Xi,Xj . . . A J 


for the 

subset [ij . 


so that 








1-A 

i^ij ' • ' 

''\k 

(8. 1) 

Fij... 


. . . Ajfc) = 

^Ji 1- 

-ij... 






^’ki 

r,j . . . 

i-h 






1 Qij 

• • -Qik 


= (1-A 

<)(i- 


-h) 

Qj,l. 

• • 'Qjk 






Qki Qkj 

...1 


If the As are put equal to the (so far unknown) disturbing in- 
tensities of the variates, the functions simply become 

the ’’true” scatterances. The second expression in (8. 1) is 
obtained by inserting in the middle expression of (8. 1) the 
expression for taken from (7. 19). 
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Consider the ^/-dimensional subset (ij . . . h), and let 
be any set of given values of the arguments A. The Taylor 
expansion of (8. 1) is easily found to be 

(8. 2) Fij • • h)=^ ^\j . . . k~^a{Kt~ K) + 

+ Z ( — A„) ( A i — A ^) Fij 
- r.'". + i-nx, - A,) (Aj-Jj) . . . (A,- A,). 

here a in runs through all the v affixes iJ . . . k. (a/?) in S runs 

a< (3 

through combinations without repetition of tlic two affixes aand/y 
picked in the set ij . . . 7c, etc. F^j j. denotes the value of F^j j. 
when the A’s are put equal totheA’s; and the inverted parentheses 
denote ’’exclusion of”. 

Putting all theT — 0 we get the expansion of the ’’true” 
scatterances in terms of the observed, namely 

(8. 3) Ffj __k— Aij . . . jt — X„ A/ j . . , ib + S A^^ A/j , . ^(, _k~~ 

-...-f-(-rA,A;...A,""^ 

and putting A = 0 replacing A by A, we get tlie expansion of the 
observed scatterances in terms of the ’’true”, namely 

(8. 4) A(j _k — . . . -t + 21a Aa . . .)«(. . . ik + 2 Aa A^ F^j 

a<p 

+ • . • + A( Aj . . . Ajt- 

From (8.3) we obtain, of course, immediately the formula 
(2. 2) for the chai’acteristic polynomial by putting all the A’s 
equal. 

Since all the A are non-negative, and the true scatterances 
are positive definite determinants, (8. 4) shows immediately that 
an observed scatterance must always (when the conditions 
I — ill of Section 7 hold good) be larger than the corresponding 
’’true” scatterance. In other words, even though the true 
scatterance k is rigorously zero, any amount of accidental 
disturbance will immediately act as a sort of a ’’cushion” that 
prevents the observed scatterance from falling down to zero. 
The thickness of the cushion depends on the intensity of the 
disturbances. To a first approximation, when only linear 
in the disturbing intensities are retained, the cushion-effect is by 
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(8.3) proportional to the weighted sum of the intensities, the 
weights being the observed scatterances in the first subsets. 

A similar remark applies to the characteristic roots. Any 
sort of random disturbances will for instance immediately pro- 
duce a systematic bias in the direction of increasing the 
minimal A-root. Indecl, if all the are replaced by X and all 
the Tf, by + A in (8. 2) we get 

(8. 6) m = F(X) + + lX, + ...+X,Xj...X, 

a 

where P{X) is the observed and F{X) the ’’true” characteristic 
polynomial in the set {ij...Jc), the true characteristic 

polynomial in the subset where a is left out, etc. A„ etc. in 
(8.5) are the disturbing intensities. Since the true character- 
istic polynomial, when written in determinant form, is positive 
detinite for all values of X not larger than the true minimaJ 
root, the development (8.5) shows that as long as we are in the 
interval between zero and the smallest of the ’’true” character- 
istic roots (the latter limit included) we must have P(X) > P\X)y 
unless all the disturbing intensities are zero, in which case 
P(A) = F{X). Consequently the observed minimal root must be 
larger than the true, whenever disturbances are present. 

More generally the whole appearance of the observed scatter 
will be influenced by the cushion effect. The shape of the scatter 
is indeed produced jointlg by two different sets of factors. It 
depends, not only on the ’’true” relations that exist between 
the systematic variates, but it is systematically biassed by the 
disturbances. This is just why it is so difficult to apply 
scatterances or characteristic roots as criteria of linear con- 
flucncy. And this is also the reason why the shape of the re- 
gression ellipsoid (the direction and length of its axes) is not a 
good indicator of those traits of the distribution which we try to 
lay bare when we study linear confluency. The things which 
we take as our landmarks when we study the main axes of 
the ellipsoid are indeed produced just as much — or perhaps 
even more — by the cushion effect as by tlie systematic con- 
nection between the variates. This applies particularly to those 
cases where there is multiple collinearity between the systema- 
tic variates, and where consequently a confluence analysis is 
particularly needed. In this case the direction and length of 
some of the axes of the regression ellipsoid will depend 
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primarily on the relative size of the disturbing intensities 
and not on the nature of the ’’true” relations that exist between 
the variates. 


9. THE »TRUE» REGRESSIONS. 

Suppose that we have a set of variates, say Nos. (ij . . .h) and 
that we actually Icnow the disturbing intensities in the set. 
Suppose that these intensities are such that they actually make 
the scatter function j. vanish, but leaving at least one of 
its first principal minors different from zero, and of course mak- 
ing it a positive definite matrix, otherwise it could not be the 
moment matrix for the ’’true” variates. The systematic parts 
of the observed variates would then be connected by a linear 
relation, and we could actually determine this linear relation 
exactly. Indeed, if the ’’true” regression is written in the form 

(9.0) = 0 
or in normal coordinates 

(9.1) + + = 0 

where stand for the empirically normalised variates 

(9.2) 

a < being the observed standard deviation of the variate • i, 
— th-n the coefficients ui would simply be the solution of the 
homogeneous system 

(9. 3) ~ ~ 9. (j = 1, 2 . . . «) 

In other words the a’s could simply be put proportional to the 
elements in a row of the adjoint of the determinant ;.|. 

All these rows would be proportional since jfc| = 0. And 
at least one of the rows would not consist exclusively of zeros 
(since we have assumed at least one of its first minors to be 
different from zero). Incidentally the absolute values of the re- 
gression coefficients could also be determined by the square 
values of the diagonal elements in the adjoint (which are all 
non-negative according to our assumptions). 
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If the disturbing intensities are net known exactly, but it is 
plausible to assume certain intervals in which they must lie, 
wc can determine certain limits for the "true” regression 
coefficients. 

Let us take the case m.= 2 as an example. We assume that 
the observed correlation matrix is given. In the case w — 2, 
all the information contained in this matrix is simply (the value 
of the correlation coefficient r ^2 between tlie two variates 
and x^. Already the fact that we have given the value of 
imposes a restriction on the possible liyi>otheses regarding the 
size of the disturbing intensities . Aj and Inde ed, not all such 
hypotheses are compatible with the observed value of 
Which hypotheses are so compatible? Obviously those, and 
only those, that make the ’’true” moment matrix, which would 
result from the hypotliesis, a positive definite matrix. This 
means that we have 


(9.4) /thcl 

(9.5) (1-Ai)( 


Graphically these conditions may 
be exhibited by saying that the 
point (XiX^) must lie in the area 
in Figure 7 below and to the 
right of the two straight lines de- 
fined by (9. 4), and at the same 
time below the equilateral hyper- 
bola defined by (9. 5) (In Fi- 
gure 7 rij = 0.40). Obviously 
the latter condition is sharper 
than the former, so we may 
disregard (9. 4) and only con- 
sider (9. 5). 

Of more interest than (9. 4) 





Fig. 7 . 

are the conditions 


(9.6) O^Ii 

that give lower limits for the intensities. This condition is 
essentially connected with the assumption (III) or (III’) of 
Section 7; (9.6) together with (9.5) define the shaded area in 
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Figure 7, as the possibility region for (Ai Ag). The larger the square 
correlation rj 2 the smaller this shaded area. As 7^2 approaches 
unity the area reduces to a point, namely origin, which means 
that in the case of perfect observed correlation, there is onty 
one hypothesis possible, namely that no disturbance is present. 
On the other hand if rjg approaches zero, the possibility region 
will at the limit fill the whole square of Figure t. 

The condition (9. 5) in conjunction with (9. 6) entails 

( 9 . 7 ) < 1 — r\2 X 2 < 1 — 

These latter inequalities express an imlependent upper limit for 
each disturbing intensity in terms of observed parameters, but 
of course this limit is in general not so sharp as the joint limit 
(9.5). 

If we supplement the information contained in the observed 
r ^2 with the assumption that the systematic variation of the 
two variates are linearly dependent, the possibility region is of 
course further limited. In this case we must have 


( 9 . 8 ) 


1 ^12 
^21 1 ~ ^2 


= 0 


SO thilt the point (^ 1 ^ 2 ) must now lie on the limiting hyperbola 
itself; in other words there is now only a one dimensional free- 
dom in our choice of hypotheses about (A 1 A 2 ). This entails de- 
jinite limits for the "true" regression coefficients. Inded, if Aj and X 2 
are so chosen that (9. 8) is fulfilled, the regression coefficients 
are given by any of the rows of the adjoint of (9. 8), that is by 
any row in 


(9. 9) 


1 ^2 ^12 

^21 1 Xi 


We may for instance put 


where ^ 1^2 the normalised observed variates. What are the 
limits of variation for the regression coefficient when 

the point (^ 1 A 2 ) varies on the possibility hyperbola of Figure 1? 
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Obviously this regression coefficient becomes the largest or 
smallest in absolute value in the points A and B respectively. 
In these two extremum points we have and 1 — ^ 2 —^* 

Hence: The ’’true” regression coefficient between the normalised 

variates and must lie between ri 2 and — . In other words: 

^12 

(9. 11) The "true” linear regression between two variates must He 
between the two elementary regressions. 

This result is deduced on the assumption that there actually 
exist a linear relation between the systematic parts of the two 
variates and further that (I), (II) and (III’) of Section 7 are 
fulfilled. 

This suggests that similar limits may be obtained by study* 
ing in several variates the condition that the scatter functions 
(8. 1) shall be positive definite. The analogue of (9. 7) is for 
instance found to be 

(9.12) ~ — 

where (ij ... A:) is any set containing a. The limit (9. 12) is all 
the narrower tlic more inclusive the set (ij . . .h). 

So far as the regression coefficients themselves are concern- 
ed, the situation becomes however much more complex in 
several variates, particularly because it may now be possible to 
find points (’’subtest points”) where, not only the highest order 
scatter function vanishes, but also all those of next lower order. 
At the Institute considerable work, both theoretical and numeri« 
cal, has been devoted to clearing up this matter in the general 
case of n varijitcs, but the results are not yet in such a shape 
as to justify publication. I shall therefore here confine myself 
to the above example in two variates. This example will be 
sufficient to suggest heuristically one of the leading ideas which 
will be utilised in Part III. 

It was explained above how the ’’true” regression in any 
number of variates can be determined, when the conditions of 
Section 7 hold good and the disturbing intensities are assumed 
given. The regression coefficients are then simply proportional 
to the elements in the adjoint of the scatter function. The 
determination of the ’’true” regression can also be thrown into 
another form which is independent of the conditions of Section 
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7 and which brings forth another set of parameters characteris- 
ing the nature of the disturbances. 

If the observed variates consist of two parts as indicated in 
(7.8), and if the there exist a linear relation between the 
systematic parts of the variates, we must have 

(9. 13) -f • • . + CLn 

where the % are constants (the ’’true” regression coefficients), 
tlie observed variates, and u a variate that may be looked 
upon as a shift of the regression plane. In terms of the 
disturbances of tlie individual variates the shift u may be 
expressed as 

(9. 14) u = aixl + . . . -f a„ x^. 

For a moment let us disregard the composition of u as defined 
by (9. 14) and let us just consider it as a variate defining the 
shift of the equation (9. 13). Multiplying (9. 13) by Xi and 
performing a summation over all the observations, we get 

(9.15) 

For the coefficients a of the normalised regression equation 
(that is of (9. 1)) we have 

«jfc == N 

consequently 
(9. 17) 
where 

(^- 1^) ^—^[uu]/N 

is the standard deviation of the shift, and 

the correlation coefficient between the observed variate Xi and 
the shift. 

The equation (9. 17) is perfectly general, and so far does not 


XV) 


— 


V M k 
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depend on any assumption about the shift. If the disturbance 
correlations are given, then (9. 17) permits a unique deter- 
mination of the regression coefficients. Indeed — the deter- 
minant of the system (9. 17) is the actually observed correlation 
matrix which will never be zero in practice. The fact that 
the standard deviation e of the shift occurs in the right member 
of (9. 17) is unimportant; indeed, in the solution this parameter 
will only appear as a common multiplier for all the regression 
coefficients, and the regression equation is independent of such 
a factor. We may therefore write the solution of (9. 17) in the 
form 

(9. 20) cti = constant 

Thus, what the observational data at hand gives information 
about is in reality not the true regression coefficients them- 
selves but only the law of linear transformation which permits 
to pass from the distiirhance correlations to the regression 
coefficients ai or vice versa. This exhibits in a striking fashion 
the limits of the information which are contained in a table of 
correlation coefficients. From such a table we can deduce 
some definite statistical regression only by a process which is 
equivalent to making certain hypotheses about the disturbance 
correlations. We shall later interpret some of the usual statisti- 
cal regressions in this light. 

The hypothesis that the shift is uncorrelated with all the ob- 
served variates i • e • = . . . = 0, can in practice not be admitted 

Indeed, this would — if we disregard the trivial case where all 
the true regression coefficients are zero — entail the vanishing 
of the observed correlation determinant. This follows immedia- 
tely from the fact that (r^^) is the matrix of the linear system 
(9. 17). 

The determination of the ’’true” regression coefficients 
obtained by means of a given set of assumptions regarding the 
Q must of course be identical with the determination by means 
of an assumption about the I’s of Section 7. Suppose for simpli- 
city that we admit the three conditions (I), (II), (III) of Sec- 
tion 7. The moment [ux^ is then equal to 

(9. 21) [u a:,] = 2* [xj = a, 1, m„. 

The equation (9. 16) therefore takes on the form 
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(9.22) Zjfc (wtf jt — Af mu %) — 0 
which is equivalent to (9.3). 

In non-normalised form the ’’true” regression defined by the 
coefficients (9.20) may be written 

(9. 23) = 0 
were is the disturbance moment 

(9.24) = 

In determinant form this ’’true” regression may be written 


0 . 


f^i ^11 • 


fill '^nl • 



This regression is — as one would expect — invariant for a 
general linear transformation of the observational variates. 
Indeed, making the transformation 

(9.26) — 

whore (Cf^.) is a non-singular matrix, we get a new observatio- 
nal moment matrix 

(9. 27) Ca where 6„j = Cj,, 

(Compare formula (1. 5) in ’’Correlation and Scatter. . .”). The 
adjoint of (9. 27) is 

(9.28) m ij = Z/jjb ^ih '^hk ^kj 

c being the adjoint of b, and c the adjoint of c. 

In each observation the shift u will be unchanged by the 
linear transformation; the transformation represents indeed 
only another way of arranging the terms in the sum to the left 
in (9. 13), and this purely formal operation cannot influence the 
value which ’’nature” has given this sum. The new disturbance 
moment will consequently be 

fii = [UX^ — Z* 


(9. 29) 



64 


In other words is cogredient with which is the essential' 
fact that will ensure the invariance of the regression. The new 
regression equation will now be 'Zij — and inserting 

here from (9. 26), (9. 28) and (9. 29) we get 

t^a ^ai ^ih ^/tk ^kj ^jfi ^ 

which reduces to 


9(. 30) |cp • = 0. 

Since the determinant |c|“ is different from zero, equations 

(9. 30) and (9. 23) are the same. 

In (9. 11) wo deduced limits for the ’’true” regression coeffi- 
cient in two variates by discussing the range of variation of the 
I's. Will a similar study of the range of variation of the qs 
in (9. 20) also furnish limits for the ’’true” regression co-effi- 
cients? Eor simplicity let us again take the case w = 2. By 
(9. 20) the regression equation in normalised coordinates will 
now be 


(9.31) 
where 

1 — zr 

(9.32) p = = , r = r’j2- 

r — e 

(9.33) 

1 ^2 

The function /?(;?), whose derivative is ^'{z) = — — will have 

one branch that starts at /? = r for — oo, increasing monotoni- 
cally to -f oo for ^ = r — 0, and appearing again at — o© for z-r-^0, 
from where it increases monotonically to r. When we follow 
these branches, we see that ^ lies between r and l/r when and 
only when z has the opposite sign of r. Consequently: If the 
disturbing correlations and ^2 have the same sign when the 
observed r is negative and if they have different signs when r 
is positive, then, and only then, will the ’’true” regression lie 
between the two elementary regressions. If this condition is 
not fulfilled, the ’’true” regression will fall outside of the sector 
between the elementary regression. In this latter case there 
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is nothing to prevent the ’’true” regression slope to assume any 
value between - « and + This is another expression for the 
fact that the limitation (9. 11) is essentially connected with the 
special assumptions (I), (II) and (III’) of Section 7. 


10. THE INTERPRETATION OF THE EMPIRICAL REGRESSIONS: 
ELEMENTARY, ORTHOGONAL, DUO-ORTHOGONAL AND DIAGONAL 
REGRESSIONS. 

By the formulae of the preceding Section we may now study 
the nature of the various empirical regressions. Each of the 
usual kinds of regressions determined statistically is connected 
with a definite kind of assumption about the disturbing inten- 
sities X or the disturbing correlations q. 

Let us first take the elementary regressions. If we adopt the 
p th elementary regression as an expression for the systematic 
connection between the variates, we assume that the coefficients 
of the ’’true” regression (9. 23) are proportional to those of the 
p-th line in the adjoint of the observed moment matrix; in other 
words, we assume that 

(10. 1) h ftilhij = {j = l,2...n) 


where is some constant independent of j. 

From (10. 1) follows immediately — provided that the 
observed moment matrix is non-singular — that 


( 10 . 2 ) 


{ 0 


{i=¥p) 

ii=P) 


In other words, adopting the p-th elementary regression in- 
volves necessarily the assumption that there is some correla- 
tion between the shift u and the observed variate but no 
correlation whatsoever between u and any of the other observed 
variates. 

Next consider the orthogonal regression. It is defined as the 
regression obtained by minimising the sumsquare of the devia- 
tions measured perpendicularly to the regression plane. If this 
is done before the variates are normalised, we obtain a regres- 
sion that is not even invariant for a change in units of 
measurement. If the orthogonal regression is to be used at all 
in a case where the units of measurements are conventional, 
it should therefore be applied to the normalised variates. If 
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this is done, the regression ooefficients a are nothing but the 
solution of (9 3) when all the X's are put equal, and equal to 
the smallest characteristic root, i. e. the smallest zero of P{1) 
defined by (2.2). In other words we may look upon the 
normalised orthogonal regression as obtained by first assuming 
that the disturbing intensity is the same for all the variates., 
and then determine this common magnitude of the disturbing 
intensity as the smallest number compatible with the assump- 
tion that the systematic parts of the variates are rigorously 
linearly dependent. 

This suggests the generalisation of estimating — through some 
more or less plausible considerations of the concrete nature of 
the variates — a set of proportionality numbers which 

can roughly express the comparative amount of disturbance on 
the variates; and then determine a common factor y by the 
characteristic equation = 0. The development of this 

equation is easily obtained from the formulae of the preceding 
Sections. 

When we are primarily interested in the regression coeffi- 
cient between two special variates x^, and (the others being 
taken into the regression only to eliminate influences that it is 
not wanted to study), we may for instance use the above 
method by putting all the X' equal to zero except and 
which we may — if no further information is available — put 
equal. The regression thus obtained may be called the duo- 
orthogonal. The explicit formula for its computation is 


(10.3) 


1 i /' 


n7+(A. 


')p( 




where e is a sign factor determined from the general appe- 
arance of the adjoint correlation matrix, and 


(10.4 a) 




+ A, 




...r) 


.)</( 






(10.4b) fJ— (A)p(+ A)g()^— 4 • A • A)p5( 

The last member in (10. 3) is most convenient for the numerical 
computation. 
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Also the elementary regressions can be interpreted in terms 
of the A’s. Indeed suppose that there is no disturbance at all 
on the vai iates except on x^. The true scatterance in the given 
set will then be the value of (8.1) when all the I's are just 
equal to zero except The value which must have if the 
systematic variates in the set shall be linearly dependent, that 
is . .k = 0, will consequently be 


( 10 . 5 ) 


K = 


^ij . . .k 




Any row in the adjoint of the scatter function with the value 
(10. 5) inserted for will then give the regression coefficienta 
The elements in the j)-th row of the adjoint thus obtained will 
just turn out to be the coefficients of the p*th elementary re* 
gression. 

The diagonal regression, is the one obtained by determining 
the signs of the regression coefficients from an inspection of 
the signs of the elements in the adjoint correlation matrix, and 
then putting the absolute value of the coefficients equal to the 
square root of the diagonal elements in the adjoint. Obviously 
this regression is the one obtained by assuming that the dis- 
turbing intensities are such that the ’’cushion" effect (i.e. the 
effect that prevents the observed scatterances in the (v — 1) 
dimensional subsets from coming down to the value of the 
"true" scatterances) is the same in all subsets. In other words, 
the observed scatterances in these subsets are assumed pro- 
portional to the true. 

The diagonal regression may be computed by the formula 


(10.6) ^ ^ e - 

V rpp («/?... y) V 


where eis the sign factor indicated. 

In the above analysis the regressions are not determined by 
any least square minimalisation procedure, but simply by 
specifying certain assumptions about the disturbing intensities 
or the disturbing correlations and then solving the equations 
that must exist exactly if the assumptions in question hold good. 
It seems that this is a more logical procedure than just to least 
square minimise certain deviations defined in a more or less 
empirical manner. If a least square process is to be taken as 
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the basis for the determination of the regression coefficients, 
it should at least be formulated in more general terms than 
those usually employed. It should be formulated so that the 
nature of the specialisation adopted in each particular case is 
clearly exhibited. The following is a suggestion for such a 
treatment of the problem. 

Let (9. 0) be the empirical regression plane. Suppose that the 
a's arc to be determined by minimising the sumsquares of the 
deviations from the plane measured in a direction whose direc- 
tion numbers are . . . c,^. These direction numbers may be a 
set of given constants, or more generally they may be given 
functions of the as. The nature of these functions are just 
characteristic for the nature of the minimalisation process con- 
sidered. We shall treat the problem on the assumption that the 
minimalisation is made in the original (non-normalised) var- 
iates. The corresponding solution obtained in the normalised 
variates is obtained simply by assuming all the observed 
standard deviations (or sumsquares) to be unity. 

Let be the coordinates of a given observation point and 
Uf the coordinates of the point of intersection between the 
regression plane and the straight line through with direction 
Ci. We then have 


(10. 7) 




Zjt 




80 that the square distance is 


(10. 8) I* (Vk-^kf = ^k? • '^k 4/(lk 


Taking the sum of this over all observations, we get 


(10.9) 


(Zjt CkY 


■lijniijaiaj. 


Equating the partial derivaties of X with respect to the as 
to zero we get 

(10. 10) '^k'^ik^k — + ^k^k^ki ~~ M 

where 


/i — ZjfeajfcC*/Z*cJ, 


( 10 . 11 ) 
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( 10 . 12 ) 


Ch — 




It does not restrict generality if we assume the e* to be such 
functions Of the that 

(10.13) 4+...+ cl=l 


for all values of the a,^. 

If this is the case, (10. 10) reduces to 


(10. U) = + (i=l,2...M). 


In this system the and must be supposed to be given 
functions of the In general the system (10. 14) will therefore 
furnish a determination of the a^. 

If we assume that the minimalisation is done in a direction 
that is independent of the inclination of the regression plane, 

(10. 14) reduces further to 

(10.15) = 


^here now X/u may be looked upon as an arbitrary parameter, 
which only defines the length of the vector The regression 
equation is of course independent of this length. 

If all the Cl in (10. 15) are put equal to zero, except one, we 
get the corresponding elementary regression. 

Putting in (10. 14) 


(10. 16) 



•means that we consider the orthogonal regression. In this 
case we get 


(10. 17) 



Also in this case we therefore have 


(10. 18) Z*a. «„= 0. (t = 1 , 2 . . . n) 

SO that we get back to equation (10. 16). 

Further, we now have = so that (10. 16) reduces to 
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(10.19) = 0 

which is the previously encountered equation for determining 
the coefficients of the orthogonal regression. 

Any number of other regressions may be defined by specify- 
ing the nature of the functions A rather extensive and inter- 
esting class is obtained by considering all functions of a^. . .a^ 
for which (10. 18) is fulfilled. 

For all functions of this class the problem will be reduced 
to a system of the form (10. 15). If further the functions •Ci are 
of the form 

( 10 . 20 ) 

where si is a given constant depending on i but independent of 
flj . . . and (P a function of aj . . . and Cj . . . but independent 
of i, then the problem reduces to the solution of a regular 
characteristic equation of the form 

( 10 . 21 ) = 9 

A being the unknovn. Indeed, in this case (10. 15) reduces to 
the system 

(10. 22) I* (nip, —A Si e^) a* = 0, whereA = XjuO 

The system (10. 22) has a non-trivial solution only when A 
satisfies (10.21). 

An example of a regression that comes under the above class 
is the one obtained by constructing through each observation 
point a plane P perpendicular to a certain set of the coordinate 
axes and then minimising the sumsquare of the deviations 
measured within P and perpendicularly to the manifold of in- 
tersection between P and the regression plane. This regression 
may perhaps be called the subset-orthogonal. The duo-orthogo- 
nal mentioned above is a special case. 

Many other more or less plausible procedure may be derived 
by other specialisations of ^e functions However, if a logi- 
cal interpretation of all these various procedures is wanted it 
seems that we must bring the problem back to a discussion of 
the effects of the parameters X and g along the lines indicated 
in the beginning of this Section. 
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11. THE AMOUNT OF INDETERMINATENESS IN REGRESSION SLOPES 
IN MULTICOLLINEAR VARIATES. THE PERSISTENCY EFFECT PRO- 
DUCED WHEN A GIVEN INTER-COEFFICIENT HAS THE SAME SIZE 
IN ALL SUBSETS. 

Utilising the results of the preceding Section we shall now 
study more closely the amount of indeterminateness in regres- 
sion slopes in multioollinear variates. 

Suppose first that we have a set of n variates . . .x^ 
between which there exist the two independent and exact linear 
relations 

(11. 1) a^x + . . . -f a^x^ — 0 

( 11 . 2 ) \x^-^ b,^x,^ — 0 . 

Let p and q be two arbitrary multipliers and let us take the 
sum of (11. 1) and (11. 2) multiplied by p and q respectively; 
this gives the new equation 


(11.3) ...-bc„rr„ = 0 
where 

(11. 4) c* = pa^ 4- qh^. 


Since p and q are arbitrary, the coefficients of the new equation 
contain also an element of arbitrariness. The extent of this 
arbitrariness is such that in the new equation we may make the 
regression coefficient between any given set of two variates, 
say between x^ and xj equal to any number we like, provided 
only that 


( 11 . 6 ) 


Clj 

hi hj 


+ 0. 


The above rule is immediately verified if we write (11.3) in the 
form 


( 11 . 6 ) Xi = ^xj-{- . . . 

The coefficient /? of this equation will then be 

P(^j + Qhj 
pai + qhf 


(11.7) 
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By a suitable choice of p and g the expression (11. 7) may under 
the assumption (11.6) be made equal to any preassigned 
quantity. 

Of course, if the multipliers p and q are chosen so as to make 
p equal to a given number, the other regression coefficients will 
be determined. In other words, if there exist two independent 
equations of the form (11. 1) and (11. 2), we are only at liberty 
to choose one of the regression coefficients in the derived 
equation (11. 6), that is to say the regression coefficients in the 
derived equation have a onn-dimensional arbitrariness. 

More generally, suppose that there exists x (*< n) independent 
equations of the form 


( 11 . 8 ^ 


K ^ 

(^:=i,2...x) 


The condition that these equations shall be independent is equi- 
valent to the condition that the x rowed and n-columned matrix 


(11.9) 


«11 • 


^Xl • 

• ' ^Xn 


shall be of rank x. If this condition fulfilled, the x equations 
considered are always compatible when x < w. Further, suppose 
that the x-rowtd and two-columne«l matrix 


( 11 . 10 ) 


®2i ^2j 
«Xf ^Xj 


is of rank two, which means that the x numbers are not 

proportional to the x numbers a^ . . . at least one pair falls 
out of proportion. Then the regression coefficients between 
and Xj in the general equation, that now exists between the 
variates, may bo chosen quite arbitrarily. Indeed, from the 
equations (11. 8) we may now deduce an equation of the form 
(11. 3), where 

(11.11) = 2 jsrPA' ^Ki 


Pk being a set of arbitrary multipliers. The regression coeffi- 
cient of Xi on Xj in this new equation is 
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( 11 . 12 ) 


'^IcPK^Kj 

^kPk^ki 


and if the condition specified under (10. 10) is fulfilled, we may 
by a suitable choice of the pj^ make (11. 12) equal to any 
number we please. 

Still more generally, if {ij . . .Jc) is a v-dimensional subset, 
and the x-rowed and v-columned matrix 


(11.13) 







is of rank v, then all the v coefficients c,;, Cj . . . in the general 
equation between the variates may be chosen arbitrarily. This 
is seen by considering 

(11. 14). «A-« = «« (« = ij ■ ■ k) 

as a system of v equations in the Retaining only such a 
set of V magnitudes pj^ as will make the coefficient matrix of 
(11. 14) non-singular, (and putting all the other equal to zero) 
we see that a solution in the is possible for any selection of 
the Cf. If we are only interested in the proportiom between the 
regression coefficients, we may now therefore say that the 
coefficients have a {v — l)-dimensional arbitrariness. (10.10) 
represents the case v=2, and at the other extreme we have the 
case v — n\ in this case the only possible values of the variates 
are = . . . = = 0, so that all the regression coefficients, inde- 

pendently of each other, may now be put equal to any values 
wo please. 

Of course, in none of the situations discussed above (when 
X > 1) has it a sense to speak of the regression equation connec- 
ting the variates, since no such determinate equation exists. 

If X > 1, some sort of side conditions may be considered which 
wDl make the regression coefficients determinate. Consider, 
for instance, the case where we have four variates . .x,^ 
satisfying two independent relations. Any set of three variates 
will tiien form a linearly dependent set. If (11. 5) is fulfilled for 
t 1, j = 2, it will now have no meaning to speak of the regres- 



sion coefficient between scj and in the big set (1234). But 
it will have a meaning to apeak of this regression coefficient 
in any of the three dimensional subsets containing (12), that 
is in the sets (123) and (124). We can express this in the usual 
regression coefficient notation by saying that has no 

meaning, while 612 3 ^124 have meanings. The latter two 

coefficients can be looked upon as those obtained from (11. 12) 
by imposing in one case the side condition = 0, and in the other 
the condition Cg = 0, the c’s being the coefficients in the general 
equation (11.3) (which now contains the four variates iCi . . . a;^). 


The one important exception to the above is when (11. 5) is 
not fulfilled. In this case it will have a meaning to speak of 
the regression coefficient between and a;2 without specifying 
the subset considered, or giving any other kind of side con- 
dition. As an example consider the case where X 2 is lacking 
in both equations, that is, we have 02 = ^2 = will 

have a meaning to speak of the regression coefficient of on 
X 2 even in the big multiply collinear set containing all the four 
variates. Indeed, whatever multipliers we use to obtain the 
new general equation, the regression coefficient of x^ on 
would in this new equation bo zero. More generally this re- 
gression coefficient of x^ on would be independent of the 
multipliers, even if x<i was not lacking in both original equa- 
tions, provided only that the regression coefficient between ajj 
and a;2 was the same in both these equations, in other words 


provided only that 


= 0. In this case it is indifferent 


for the definition of the regression coefficient which one of the 
various derived equations we consider. In particular the coef- 
ficient considered is the same no matter which one of the 
various subsets we consider. We shall refer to this fact by 
saying that the regression coefficient in question now shows a 
persistency effect even in the multicollinear set. 


In the above example we only had two »equation8 (x = 2). In 
the general case, there will be produced a persistency effect for 
the regression coefficient between the two special variates 
and Xj whenever the matrix (11. 10) is of rank one. And if the 
matrix (11. 13) is of rank one, there will be produced a per- 
sistency effect for the regression coefficient between any pair 
of variates in the set (ij ... A:). 
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So much for the situations that arise when the variates fulfil 
exactly certain linear relations. Now suppose that each of the 
n variates is the sum of a systematic component and a 

disturbance as explained in Section 7. Suppose that there exist 
two or more independent linear relations between the systematic 
parts of these variates, but that we are not aware of this multi* 
collinearity and proceed to determine an empirical regression 
equation between all the variates. 

If the empirical regression equation is taken in the diagonal 
form (10. 6) the squares of the regression coefficients — when 
normalised variates are used — are simply the correlation de- 
terminants A (. It is therefore to be expected that the deter- 
mination of the regression coefficients in the present case will 
be particularly strongly influenced by a cushion effect similar 
to the one we studied in Section 8 for scatterances, minimal 
roots etc. Let us see what form this will take in the present 
case. By (8. 4) the determinant A^i( is equal to 


(11.16) 


^)i( — ^)iai + ^ <«/?( + 

«</5 

-4- . . . A,jl2 • • • 


where ’’true” scatterances in the 

various subsets. 

Let us first suppose that the systematic parts of the n 
variates, that form the ig set, are linearly dependent but that 
there is some (« — 1)- dimensional subset whose systematic parts 
are not linearly dependent. This means that at least for some i 
will be different from zero. It now has a meaning to 
speak of the regression equation between the systematic parts 
of the variates, and is the square of the ’’true” regression 
coefficient that occurs in front of when the ’’true” regression 
is written in homogeneous form. Further, let us suppose that, 
not only is, (for some in the right member of (11.15) 

different from zero, but that the disturbing intensities X are 
small enough to make this F the principal term in the right 
member of (11. 15). In this case it has a meaning to take the 
observed A)i( as an approximation to the square of the ’’true” 
regression coefficient. 

But if there exists exactly two independen equations between 
the systematic parts of the variates, then all the Pm will be 
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zero but not all the Hence, all the first terms in tiie right 
member of (11. 15) will vanish, and the principal terms in the 
empirically determined diagonal regression coefficients will now 
depend essentially on the disturbing intensities. Any other type 
of empirical regr^sion (elementary, orthogonal etc.) would 
show a similar effect. 

A similar consideration applies if the systematic parts of the 
variates are connected by a certain number x of independent 
equations. The only difference is that now all the first (x — 1) 
terms in the right member of (11. 15) will vanish. 

To illustrate the above general tendency let us consider three 
observational variates cc 2 > 'is suppose that there 

exist a structural relation of the form 

( 11 . 16) ~ ^12^2 d" /^i 8 ^s d” y 

where As i''*® ’’true regression coefficients and y a 

disturbance. Multiplying (11.16) by x^ (e = 2,3) and extending 
a summation over all the observations, we get 

(11- 17) »«ii = ?n”hi + A3»»3( + kyl- 

The wi's are the observed moments and [x^y] the unobserved 
disturbance moments. If on the basis of the observations of 
Xi, Xii, x^ we determine the elementary regression of Xi on X 2 and 
X 3 we get 


(11.18) 


— ”* 23 ) 


and similarly for Inserting into (11,18) the expression for 
mj 2 and taken from ( 11 . 17) we obtain 


(11.19) 



^22 ^23 

W 82 mss 

• A2 d- 

[aj,y] m„ 
[x»y\ »n»a 



W22 niis 

Mss Mss 



If X 2 and iC 3 are not exactly collinear, (11. 19), may be written 
in the form 



( 11 . 20 ) 


Qi ^23 
08 1 


• \J[yy] • [ar2a^.j 

^23 

where A 23 is the scatterance in the set (23) and Qi the correla 
tion between the disturbance variate and the observed x^. 

This illustrates that the error term — that is the second term 
in (11. 20) — becomes all the more important the more perfect 
the correlation between X 2 and x^. The error term is inversely 
proportional to the scatterance in the set (23) so that as this 
scatterance decreases towards zero, the error term will increase 
beyond any limit, and may therefore easily become dominating 
besides the ’’true” term 

Apart from the trivial coincidence when the determinant to 
the right in (11. 20) vanishes, it is only when there is no distur- 
bance, i. e. when y = 0, that the result is independent of the 
amount of correlation that exist between x^ and x^. In this case 
tlie elementary regression coefficient will by (11.20) 

always be exactly equal to the ’’true” coefficient /?i2> provided 
only that and x^ are not exactly linearly connected, in which 
case (11. 19) show’s that 6123 become of the form 

12. THE STABILITY OF A REGRESSION COEFFICIENT UNDER THE 
INCLUSION OF AN EXTRANEOUS VARIATE. 

The fictitious determinateness studied at the end of the last 
Section is even such that it may exhibit a certain measure of 
stability as we go from one set of variates to another. As an 
example suppose that we have three variates x^ between 

whose systematic parts there exists two independent relations, 
80 that = J?’23 == 0, = -^13 = -^)3( = -^12 = ^^^r the 

lower order true scatterances we have by (8. 1) -^)12( — -^3 - 1-^3 
etc., so that by (11. 15) 


A)i( = A23 = Ag + I3 ^2 ^8 
(12. 1) A)2< = Ajg = Ai -I- Aa Aj As 

A)3( = Ai 2-= Aj H- Aj — AjAg 

The square of the regression coefficient of x^ on x^ as ’’deter- 
mined” by the diagonal regression in the set (123) will cojn- 
sequently be a ratio between erratic elements, namely 
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A)i( 


At "f* As — A3 
Aj “f* A3 “■ Ao An 


We would have found a similar result for the elementary re- 
gression coefficient (that is for the coefficients obtained by 
minimising in the direction of the axes of one of the variates). 
Now suppose that we add a variate which has nothing at all 
to do with the other variates. The easiest way to express this 
in our formulae is to say that has no random part, 
i. e. A4 =0, and that its systematic part is uncorrelated with 
all the other variates. Let us first make use only of the latter 
property but temporarily, for the symmetry of the formulae, 
preserve the letter We now have J^\23=: = 

JF’ig = = -^23 = ^ (^- ^)» now Qij = 0 when i=^j, 

= Ai)(l AJ, F2^ = (1 A2)(1 A4), i^34=(l — A3)(1 — A4); 

consequently 


= A, 3 ,=A 2 ( 1 ^A 3)(1 -A 4 )+ A 3(1 -A 2 )( 1 ~A 4 ) 

+ A 4 ( 1 -A,X 1 -A 8 ) + A 3 A 3 ( 1 -A 4 ) 

+ A 2 A 4(1 — A 3 )+A 3 A 4 ( 1 — Aj) 

“I" A2A3A4 

CJollecting the terms of this expression we find that it reduces to 

( 12 . 3 ) ^234 ~ (f A4) • Ags + A4. 

Therefore if A4 = 0, we get A 334 = Agg. And similarly for 
A)2 (=Aj 34, etc. In other words: The disturbances will tend to give 
the same spurious value to A234 as to A 23. The same would ai^ply 
to the scatterances in other subsets. This means that the 
diagonal regression is virtually unchanged by the inclusion of 
the extraneous variate. And the same applies to the elementary 
riegression. 

This effect appears in its purest form when the new variate is 
completely uncorrelated with the systematic parts of the old 
variates. But the tendency will be the same even if there should 
be some correlation between the new and old systematic 
variates. 

A regression coefficient will of course also be relatively 
stable under the inclusion of an extraneous when the regression 
slope in question express a ngn^ficant connection between the 
variates fwhen the oriarinal set was not multinle collinearl. but 
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in this case the stability is less interesting from the confluency 
view-point. 


13. THE DEGRADATION EFFECT. 

When we have included so many variates in the regression 
analysis that a situation is reached where the systematic parts 
of the variates are multilinearly connected, the empirical 
regression coefficient will, as we have seen, depend essentially 
on the disturbances; they will for instance not reveal the true 
magnitudes which the regression coefficients have in the lower 
sets that were exactly and simply collinear. Can anything be 
said in general about what values the regression coefficients 
in the multilinear sets tend to assume? 

I believe that some such general rules may be formulated, 
although I have not at the moment any exact proof of it. It 
seems that there exists a degradation effect of regression 
coefficients in the sense that when too many variates are in- 
cluded so that we get a multiply collinear set, the coefficients 
tend back to the gross value they had in the lowest sets with 
the poorest fit, that is to say the coefficients tend back to the 
values they had before the systematic influence connected with 
the other relevant variates was eliminated. 

Suppose, for instance, that there are four observational 
variates, and that there exist good linear relations in all the 
three-sets, while there is only very little correlation in the 
two-seets. The elementary regression coefficient of Xi on X2 
determined in the set (12) — we denote it 
a large extent, be determined by exterior influences. Adding 
the variate 3 we get a regression coefficient of on 
namely that may be entirely different from by^] 

for instance indicate a positive connection between and x.2, 
while &12 3 indicate a negative connection. On our as- 

sumptions the fit in the three-set (123)* is good, and ftjg 3 would 
actually be a significant expression for the connection between 
Xi and 0C2, but 612 would not express any systematic connection 
between and 

Now suppose that we add still another variate x^. On the 
assumption that not only (123) but also one, or which amounts 
to the same, all of the other three-sets are linearly connected, 
the degradation of the coefficient considered would now take 
place; There would be a tendency for fti2.34 come back to 
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the value instead of improving the result by adding 
w© come back to the poorest of all the results obtained so far. 

What happens is very much the same as what happened to 
the man who climbed up a ladder, and insisted on taking still 
another step after he had reached the top of the ladder. 

The above rule can be illustrated and its general validity 
rendered plausible by the following considerations. Suppose 
that we have 4 observational variates which are made up as 
linear combinations of the 6 basic variates in the following 
way 


( 13 . 1 ) 


^i — yt + 0,1 ys 

a:2 = y.^+ 0.1 

— yi + ^2 + 0.1 ys 
+ ^>1 2/6 


In other words and y^ constitute that part of the basic 
variates that produce the systematic connection within the 
observational set (1234), and yj, y^, y^ and yg constitute distur- 
bances. 

If the variates y are determined by independent random 
drawings, the observed x will show certain linear connections. 
The systematic parts of the x^ will indeed be 


(13.2) 


=S/i 

4=2/2 
4=J/i + 2/2 
4=2/1- 2/2 


And eliminating the basic variates yj and yg from these equa- 
tions we see that any three-set of the systematic parts of the 
observational variates w^ll satisfy a linear equation. Taking the 
coefficients of these equations as the ’’true” regression coeff- 
icients we see that the ’’true” regression equations will be:* 

In the eet not containing No. 1 2x3 — xs Xi =0 

> * » * * » 2 2xi — ~j:4 =0 

* • , 3 a:i —X3 —*4 =0 

>>*» » >4 Xt X3 Xi =0 

Of these four equations it is, of course, only two that ane 
linearly dependent. 
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By (7. 6) the observed crossmoments will be 


Wl2 = 

^12 

+ 0.1 ( 


■f ^23) 

+ 

0 

b 

■ ^84 



mi8 = 

1 

+ S12 + 

0.1 

(^15 

+ 

^13 

+ 

^23) + 

0.01 

^85 

^23 = 

1 

+ ^12 + 

o.t 

(^25 

+ 

^14 

+ 

^^24) + 

0.01 

^46 

2«i4 = 

1 

— -^12 + 

0.1 

(^16 

+ 

*'^18 

— 

‘'^28) + 

0.01 

^86 

^24 = 

— 1 

+ ^12 + 

0.1 

(^26 

+ 

^14 

— 

^24) + 

0.01 

^46 

W84 = 

-f 0.1 

(^16 + 

^26 ■ 

I- % 

— 

•^25) 

1 + 

0 

b 

^66 


Wn = 

1.01 -f 0.2 

^18 








W222 = 

1.01 -^0.2 

^24 









^33 — 2.01 + 26*12 “1“ (^15 "I" ^25) 

W44 = 2.01 — 2^12 + 0.2 ( 5 i 8 — s^e) 

If we can assume that the basic variates are exactly uncor- 
related, (13. 4) reduces to 


(13.5), 


mi2 = 0 
mi3 = 1 

Wl23 = l 
m^n = 1 
m24 = — 1 
m34=:;0 
will = 1.01 
^22 = 1.01 
= 2.01 
= 2.01 


Determining in an actual case the values of the basic variates 
y by random drawings we would, of course, not get exact non 
correlation between them, but some small accidental inter- 
correlations. These intercorrelations between the basic variates 
would not, however, have a very great effect on the regres- 
sions computed. For the three-sets this is seen by comparingi 
(13.6) with (13.7). (For the s were used the values obtained 
in the experiment which is discussed in more detail in Section 
23). It is seen that (13. 6) and (13. 7) give essentially the same 
result. And this result is also very close to the ’’true” regres- 
sion coefficients given by (13.3). In order to make the com- 
parison with (13.3) easy the coefficients in (13.6) and (13.7) 
are reduced so as to give the same absolute row sums as 
in (13.3). 
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TABLE (13.6). 


In set not 
containing the 
variate No. 

Diagonal regression coefficients computed on the basis 
moments (13.4). 

of the complete 


X., 

a's 

3-4 

1 

0 

+ 1.985 

— 1.021 

-f 0.994 

2 

-f 2.011 

0 

- 0.991 

- 0.998 

3 

-f 1.028 

- 0.985 

0 

- 0.987 

4 

-f 1.008 

-f 0.999 

— 0.993 

0 


TABLE (13.7). 


Diagonal regression coefficients computed on the basis of the abbreviate 
In set not moments (13.5). 


containing the 


variate No. 

. . . 


1 

3-1 


1 

0 

4* 1.990 

— 1.005 

4*1.005 

2 

, 4- 1.990 

0 

— 1.005 

-1.005 

3 

-f 1.002 

— 1.002 

0 

— 0.996 

4 

-f 1.002 

+ 1.002 

- 0.996 

0 


When we consider the four-set (1234) we find some, although 
not a very great discrepancy between the result obtained by 
using the complete moments (13. 4) and the abbreviated mo- 
ments (13. 6). This is seen by comparing (13. 8) with (13. 9). 


TABLE (13. 8). ADJOINT MOMENT MATRIX COMPUTED ON THE 
BASIS OF THE COMPLETE MOMENTS (13. 4). 


^ij 


j = i 

2 

3 

4 

* = 1 


0.067 

0.007 

— 0.036 

- 0.029 

2 



0.066 

— 0.037 


3 




0.037 


4 






TABLE (13. ! 

9). 

ADJOINT 

MOMENT MATRIX COMPUTED ON THE 

BASIS 

OF THE ABBREVIATED 

MOMENTS (13. 5). 




2 

3 

4 

1 

t=l 


0.061 

0.000 

- 0.030 

-0.030 

2 



0.061 

- 0.030 

0.030 

3 




0.030 

0.000 

4 





0.030 
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Of course neither (13. 8) nor (13. 9) can be compaxed with any 
"true” table of coefficients, since it has no meianing to speak 
of a regression equation in the four-set (1234). 

If the degradation effect manifests itself in the way suggested 
above, we would expect that the empirical regression coeffi- 
cients determined in the set (1234), in other words, in the set 
that is too large to give a meaning to the coefficients (because 
the set is multicollinear), should be more or less equal to the 
coefficients determined in the sets that are too amall to give the 
correct net coefficients (because the sets do not include all 
significant variates). That this is actually so, will appear, for 
instance, by a comparison between the elementary regression 
coefficient of on X2' m the sets (12), (124) and (1234) respec- 
tively. Using the complete moments (13. 4) we find 

6i2 = — 0.120 

(13. 10), &12.4 ~ + 0.919 

— 0.112 

The coefficient determined in the four-set is here virtually 
equal to the corresponding gross regression coefficients deter- 
mined in the two-set. And both these regression coefficients, 
both that in the two-set and that in the four-set, are entirely 
different from the ’’true” coefficients existing in the three-set 
and being equal to -f 1. 

In the more complete discussion of this numerical example 
in Sections 23 and 24 we shall see that this degradation effect 
is present all over. 

PART III. BUNCH ANALYSIS. 

14. REGRESSION STABILITY UNDER A CHANGE IN THE 
MINIMALISATION DIRECTION. 

Let US sum up the main conclussions of the analysis of the 
last Sections. Each observational variate is considered as made 
up of a systematic part and a disturbance. The first is that part 
which we may hope to ’’explain” by considering simultaneously 
the variations of several observational variates, the ’’explana- 
tion” taking statistically the form of one or mere regression 
equations between the variates. The other part is that which 
we cannot hope to ’’explain” in such a way. Judging the 
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significance of a coefficient in such a regression involves 
therefore two questions, one regarding the systematic varia- 
tions, the other regarding disturbances: (I) Does the coefficient 
represent a net relation between the two variates which it 
connects? In other words, are all other systematic influences 
eliminated? (II) What is the precision of the determination;? 
That is to say, how much discrepancy must we be prepared to 
reckon with between the actually computed value of the 
coefficient and that value which would have emerged if no 
disturbance had been present? 

This distinction between systematic variations and dis- 
turbances being adopted, the purpose of any attempt at statisti- 
cal determination of regression coefficients may be formulated 
thus : We want to compute a coefficient of which it can be said 
in the first place that it represents a net relation, and in the 
second place that it is very improbable that it is widely 
different from that result that would have emerged if no disi- 
turbance had been present. 

If we have reason to believe that an empirically determined 
interooefficient in a regression equation does not represent a 
net relation in the above sense, we may try to include in the 
regression equation one or more new variates and then de- 
termine the intercoefficient as it appears in the new equation. 
But by so doing we must however be very careful. There are 
at least four different things to take account of in this connec- 
tion. 

(1) In the first place the new variate may contain a 
component that is systematically connected with the other 
variates, it is just because of this component that the idea 
presents itself to include the new variate as an element in the 
analysis. But the presence of some such systematic component 
in the new variate is in itself not a sufficient ground to include 
this variate in the analysis. 

(2) Account must also be taken of the fact that a disturb- 
ing element is always introduced as a part of the new variate. 
This disturbing element will lessen the precision of the total 
result obtained; and the loss on this account may be larger 
than the gain produced by the introduction of the systematic 
part of the new variate. 

(3) The situation may further be such that not only is a 
new disturbing element introduced by the new variate, but the 
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inclusion of this variate may create a situation where the 
disturbing elements contained in all the variates is given a 
larger opportunity of influmcing the result than before. The whole 
regression technique, built on moment determinants, etc. become 
indeed all the weaker and all the more sensitive to random 
disturbances the larger the number of variates included. This 
increase in the sensitivity to random disturbances is particul- 
arly great if in the new set of variates there are two (or more) 
subsets whose systematic parts are fairly well linearly con- 
nected. In this case the new set will be near to a multiply 
collinear set; and the regression coefficients consequently 
depend essentially on the random disturbances. We get a 
degradation effect as discussed in Section 13. 

(4) The tendency towards nonsense result mentioned in (3) 
will be lessened so far as a certain intercoefficient is concern- 
ed, if the structural relations between the variates are of a 
special sort, namely such that the intercoefficient in question 
is nearly the same in all the subsets in that big set which 
comes near to being multiply collinear. This is the persistency 
effect studied in Section 11. 

The kind of change we shall get when we include the new 
variate will depend on the relative strenghts of all these four 
tendencies. 

Both the size and the precision of a given intercoefficient 
may change. The precision may for instance be so much 
weakened that we would have been better off by not including 
the new variate. It might have been better deliberately to 
leave some of the systematic bias in the coefficient in order to 
be better protected against the random disturbances. This is 
quite conceivable if the object of the statistical investigation is 
— as we formulated it above — to get a regression coefficient 
of which it can be said that it is very improbable that it is 
widely different from the ’’true” regression. To use an illustra- 
tion. In target shooting the result depends, not only on the 
correct aiming but just as much on the steadiness with which 
one pulls the trigger. If for some particular reason it is 
impossible to pull the trigger steadily when one aims exactly 
at the target, it is quite conceivable that it would be better 
deliberately to aim a little on the side of the target. And so in 
statistical analysis it may be found safer deliberately to 
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leave some bias in the regression coefficients by not including 
a certain variate in the analysis. 

Does there exist any empirical criterion which can tell us 
whether — when all these various factors are taken into 
account — a certain variate ought to be included or not? 

Let us for a moment revert to the two-variates example 
discussed in Section 9 . We found that under the assumptions 
(I), (II) and (III’) of Section 7 the ’’true” regression slope must 
lie between tiie regression slope obtained by minimising in the 
Xi direction and that obtained by minimising in the direc- 
tion. These two slopes form limUs between which the true slope 
must lie whenever the assumptions specified hold good. But 
there is nothing in the observed correlation matrix (here 
consisting only of the correlation 'coefficient 7*12 and the unit 
elements in the diagonal) which permits to choose between the 
above two limits, or to fix any number intermediate between 
them. Thus it is when, and only when, there is a good 
agreement between results obtained by the two minimalisations, 
that the application of the assumptions (I), (II) and (III’) 
permit to draw any definite conclusions about the ’’true” re- 
gression slopes. 

Even without carrying a similar analysis through exactly in 
the general case of n variates there is one conclusion which we 
can draw immediately. If the random intensities are compara- 
tively small and the systematic variates in the set considered 
are linearly connected, then there must necessarily be a small 
disagreement between the results obtained by minimalisation in 
the various directions. In other words, a good agreement in 
this sense is a necessary condition for a situation where we 
are allowed to conclude that the shape of the scatter reveals 
anything definite about the ’’true” linear connection between 
the variates. In the case of many variates this condition may 
not be rigorously sufficient. More precisely expressed: if se- 
veral determinations of a given regression slope — for instance 
the one between and x^ — is made by minimalisations in 
different directions in the big set, it is conceivable that 
although the big set is multiply collinear so that each of the 
results obtained is influenced primarily by the disturbing inten- 
sities (that is each result has fictitious determinateness), it may 
still happen that a number of them coincide more or less by 
pure chance. The disturbing intensities may be so distributed 
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that some agreement in the result is obtained. In practice such 
a situation will however be very improbable. It will be all the 
more improbable the greater the number of different determina- 
tions that coincide and the more perfect multicollinearity that 
exists in the big set. The regression coefficients will then 
almost certainly change violently with the direction of the 
minimalisation. 

On the other hand, each determination of a given regression 
slope may be looked upon as answering one particular question 
which we have put to the data. If the data continues to give 
consistent answers when asked in different ways, it seems 
plausible to accept this as a criterion that there is something 
significant in these answers. 

Therefore, if we study a given regression coefficient, say 
the one between and X 2 , and include more and more variates 
into the analysis, we should expect that the cluster of the 
results obtained by minimalisation in the various directions 
will become tighter and tighter as new and really relevant 
variates are included, but we should also expect that the 
cluster will suddenly ’’explode” when some variate is introduc- 
ed which makes the set multiply collinear. This is the fh'st 
main idea of the confluence technique which we are now 
going to discuss. 

An essential point in this connection is that each regression 
slope is treated separately in all the subsets and for all possible 
minimalisation directions. Already in Sections 6 and 6 we have 
made some use of the idea of regression stability under a 
change in the minimalisation direction, but there we condensed 
all the information in one single testparameter for each selec- 
tion of the variates. Now we are to study the dispersion of the 
individual results. The various minimalisation directions are, 
so to speak, considered as elements in a ’’sample”, and from 
the organisation or lack of organisation in this sample we are 
going to judge the significance of the result. This will prove 
to be a powerful tool of analysis. 

The second main idea is that the spread of the results obtained 
by the minimalisation in different directions is all the way 
through compared with the average value of these results. In 
other words, we combine the study of net slopes with the study 
of precision. This combination is one of the essential features 
which make the present method superior to those described in 
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Part I, where the criteria used did not involve the slopes. Th^ 
present procedure may perhaps be called hunch analysis on 
account of the particular graphical form in which it is natural 
to express the results for analysis. This graphical form is ex- 
plained in Section 16. 

It is on purpose that I have not attempted to give any formal 
and rigorous definition of the ’"probability” for a specified! 
result obtained by the different minimalisations. Such a formal 
definition may indeed be obtained by starting from many differ- 
ent types of abstract schemes. Each scheme will lead to a 
particular definition of the probability in question. By focussing 
too much attention on the exact definition of the probability 
there is some risk that one will forget the very relative andj 
limited meaning which must always attach to such a numeri- 
cal computation of a ’’probability”. It is indeed only in a very 
special meaning that any such probability can be said to 
measure the ’’significance” of the results. At least, to start 
with, I believe it will be a better application of time and energy 
to work experimentally with the method and rely on one’s in- 
tuitive judgement of whether a given spread in the various 
determinations of a given regression coefficient is reasonable 
or not. 

15. THE TILLING TECHNIQUE. CONCENTRIC NUMBERING. 

The carrying through of the analysis whoso main ideas are 
indicated in the last Section necessitates the computation of all 
the elementary regressions in all possible subsets. In practice 
this is done most conveniently in normal coordinates, that is to 
say the correlation matrix is taken as the point of depart 
ure instead of the moment matrix (m^). 

The problem in thus to compute all the adjoint elements 
^Q(«, / 9 ... y) ill all possible subsets. In other words, if there are 
n variates in the big set, we need to compute the adjoint matrix 
of the n rowed correlation matrix in the big set (12 . . . w), 
further we need the ad joints of all the n (w — l)-rowed matrices 
obtained by leaving out one variate at a time, further we need 
the adjoints of all the (n — 2)-rowed matrices obtained by 

leaving out two variates at a time, and so on. 

On the face of it this work sieems to be prohibitive, but a 
systematic way of doing it can be found that makes it a rela- 
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tively simple job. The method which I have developed for 
this purpose is now used extensively at the Institute in Oslo. 
We refer to this work as ’’total tilling” or shorter ’’tilling” of 
the correlation matrix. 

A practical tool used in tilling — and also for many other 
purposes — is what may be called concentric numbering. It is 
a systematic way of numbering the combinations that may 
be formed by selecting in all possible ways p elements out of n. 
For brevity the biomial coefficient is used to indicate, not 
only the number of such combinations, but also symbolically 
to denote the operation of numbering. 

The concentric numbering is built up in such a way that if 
a new element is added, that is if we go from w to w + 1, the list 
of combinations is simply elongated., without inserting any new 
numbers between those already written. This is a practical 
advantage in statistical work, where it will frequently be found 
necessary to include new variates in the course of the in- 
vestigation. 

The concentric numbering is defined by recurrence. First 
concentric numb ring of the n elements 1, 2 . . .n, one at a time, 
that is for p=l, is simply defined as the natural sequence of 
these n numbers, thus the concentric numbering consist of 
the n ordinals 1, 2 . . . «. On the other hand there is only one 
complex which can be formed of the n elements 1, 2 ... n taken 
w at a time. Thus in the concentric numbering there is only 
1 ordinal, and it is written 12 ... w. 

These definitions of the concentric numberings (jj and (”) 
being laid down, we define concentric numbering (p) as obtained 
by first writing the list of ordinals occurring in the concentric 
numbering and then elongating the list by writing the 

ordinals of concentric numbering adding to each of the 

latter ordinals the letter w. For six elements this gives for 
instance: 
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(?) (1) (?) (?) (t) it) 

1 12 123 1234 12345 123456 

2 13 124 1235 12346 

3 23 134 1245 12356 

4 14 234 1345 12456 

5 24 125 2345 13456 

6 34 135 1236 23456 

15 235 1246 

25 145 1346 

35 245 2346 

45 345 1256 

16 126 1356 

26 136 2356 

36 236 1456 

46 • 146 2456 

56 246 3456 

346 

166 

256 

356 

456 


In order to carry the tilling technique through we first 
prepare 2" — n — 1 adjunction tables, namely (gj 2-rowed tables, 
( 3 1 3-rowed tables, etc. Each such table corresponds to a 
given combination in the concentric numbering of the subsets, 
and the cells of such a table shall receive the results of the 
adjunction within this subset. In other words, each such table 
is going to be a table of an adjoint correlation matrix. Each 
row and column in such a table is numbered according to the 
place which the elements in question hold in the big set. Thus 
the 4 three rowed tables in a four- variate problem will be 

12 3 

1 

2 

3 


1 2 4 
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1 3 4 2 3 4 



The elements in, say, the secM>nd of these tables are defined as 
those obtained by first forming the three-rowed correlation 
matrix consisting of the rows 124 and the columns 124 from the 
original four-rowed correlation matrix, and then taking the 
adjoint of this three-rowed matrix. The last element in the 
second row of the adjoint considered will for instance be 

(15. 1) ^24(124) = (^42 ^41 ^‘ 12 )' 

These tables may for brevity be called the tilling tables 
Their elements may be called the tilling elements. The tilling 
tables are symmetric because 

(1^*2) /s . . . r) = /3...ry 

As an extra bottom row in each tilling table we provide 
space for the numbers obtained by taking the product sum of 
the elements in each column of the tilling table with the corres- 
ponding elements in the origrinal correlation table. Of course 
this product sum will be nothing but the value of the determin- 
ant y where (a/? . . . y) is the subset considered, (i. e. the 

concentric number on the tilling table in question). The 
numbers entered in the bottom row of a given tilling table 
ought therefore all to be equal (apart from inaccuracies due to 
the fact that the computations are carried through with a 
limited number of decimal places). This is taken as a check 
on the computation; and at the same time it serves to compute 
the scatterance in the set (a/? . . . y) to which the table refers. 

The check in question can also be performed in the following 
way: In an extra column to the right in each tilling table w© 
put down the sum of the elements in the various rows of th)a 
corresponding correlation matrix itself, that is, in the tilling 
table (a, ^ . ..y) we put down the sums 

(15.3) . = + + (x = of, /? . . . y). 



Then we take the product sum of each column in this tilling 
table with the column consisting of the numbers 

• -^yoia product sum ought also to be 

equal to . j/. And this applies no matter which one of the 
columns of the tilling table we are multiplying with. Indeed, 
by the product summation in question the result arising from 
any term in the right member of (15.3) will be zero, except 
the result from one special of the terms, namely the one cor- 
responding to the tilling table column with which we are 
multiplying; and this latter result will be y. This check 

is very reliable because it really amounts to testing each 
column in the tilling table by all the columns in the original 
matrix. But it involves a little extra work, namely the forma- 
tion of the sums (16. 3). In most cases we only use the check 
based on the elementary columns of the given correlation 
matrix. 

Since the tilling tables are symmetric, only the diagonal and 
one of the two triangles are filled in; as a rule we use the 
north-east triangle. A "row” or a ’’column” must then l)e in- 
terpreted as a broken line reflected under 45° on the diagonal. 

The elements in the tilling tables may be built up system- 
atically starting with the two-rowed tables, by means of these 
one computes the three-rowed tables, etc. The technique is as 
follows. 

When the tilling tables of a certain level (for instance the 
three-rowed tables) are computed and checked, all the diagonal 
elements in the tables of the next higher level are first filled 
in. All these are scatterances (principal minors of the original 
correlation matrix), and are therefore already computed. They 
are found in the bottom rows of the tables of the lower level. 
The problem is therefore only to compute elements of the form 
^ij(aj 3 .,.y) where i^j. Let v be the number of affixes in the 
set (a/? . . . y). Further, let i stand as the p • th number in the 
sequence (a. /? . . . y), and j as the q * th. For instance if (a, ^ . . . y) 
is the set (12567) hence v — b, and if i=r2, jz=5, we have 
p = 2 g = 3. The numbers p and q being thus defined, the ele- 
ment ^ij(a,i}...y) is equal to times the determinant: 
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The symbol (15.4) denotes the determinant formed by the 
{v — 1) rows a, ^ . .)i( . . . y and the (v—1) columns . . .)j( . . . y 
of the original correlation determinant; the inverted parenthesis 
) ( meaning ’’exclusion of”. 

Since i 4= jf, we know that the row j and the column i actually 
occur in (15.4). If i<j we may therefore write (15.4) more 
explicitly in the form 

a 

§ 

(15.5), ),• 

3 

y 

Let us develop the (v — l)-rowed determinant (15. 5) according 
to the row j. Since j was the ^-th number in the sequence 
a, /? . . . y, it will be the (g — l)th number in the sequence a, /J . . .)t( . . .y 
(when % < j). The first term of the expansion of (15. 5) will there- 
fore be times (~)^«~^>+^ times the determinant 
)a( /? . . . t . . . )y( . . .y 
a 

r 




(15.6) 
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The next term will be r^j times (— times the determinant 
obtained from (15.6) when in the head line )a(/?. . . is replaced 
by «)/?(.. .. And so on. Apart from the sign the various de- 
terminants occurring in this development is to be found in one of 
the tilling tables of the ntxt lower level. The determinant (15. 6) 
for instance is nothing but 

(15.7) + ..y) (when i<j) 

(15. 5) is consequently equal to 

(15.8) 

and therefore 

(15.9) = 

yc = n,j3...)j(...y 

This formula was developed on the assumption that i < j, but 
it obviously holds good also if i > j, the only difference in this 
latter case is that the sign factor of (15.6) will now be(— 
instead of ( — and that of (15.7) will be ( — instead of ( — )^‘^\ 
SO that the sign factor of (15. 8) and hence of (15. 9) is un- 
changed. Since all the tilling tables are symmetric by virtue 
of their definition we see that we may interchange i and j in 
the right member of (15. 9). 

The formula (15. 9) is capable of a very mechanical and easy 
application. As an example, suppose that it is wanted to 
compute ^ 25 ( 12567 )- III Ihe tilling table (12567) the row 2 and 
the column 5 are covered with cardboard or metal strips. This 
leaves the four figures 1567 to be read in the left margin. We 
look up the already computed tilling table for this set (1567). 
Here we consider the column 5 (i. e. that column which was 
covered in the table (12567)). We take the product sum of this 
column with the column 2 in the original matrix, the result 
— with the sign changed — is the element sought, namely 

^ 25 ( 12 . 567 )* 

For this work it is convenient to keep each of the columns 
in the original correlation matrix written (or better type- 
written) on a separate strip of paper or cardboard. The pairing 
of the columns which are to be product-summed is then an 
easy matter. It is a particular advantage that there is no fuss 
with the signs in the formula (15.9). One simply takes the 
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product sum of numbers that are already written in the tilling! 
tables or in the original correlation matrix, and then always 
change the sign of the final result 

When all the elements of the tables of the v-rowed level are 
thus computed, the tables are checked as explained above and 
one proceeds to the tables of the next higher level, etc. In 
Section 23 is given a complete example of these computations. 

It will be noticed that all the tilling work is perfectly 
mechanised. It can indeed be done practically without any 
thinking. This is a big element in eliminating errors. Further, 
it will be noticed that in the course of the work not a single 
number is written that is not itself a finished result. And 
all the numbers are written immediately in just the place where 
they belong when the result is to be tabulated in a form conve- 
nient for systematic analysis of the various possible regression 
equations. 

Our experience is that a complete tilling takes less time than 
the preparatory work of computing moments and correlation 
coefficients, which are indispensable for linear regression 
analysis according to any method. Once time and money have 
been invested in working out these basic parameters it is well 
worth taking the comparatively little extra trouble needed to 
make a complete tilling. 

It is strongly to be recommended to do the complete tilling 
at once. No attempt should be made to pick out a few sets 
which the investigator for one reason or another believes are 
the most important. The essential point in the present method 
is just that all the sets are discussed without any preconceived 
ideas. All my practical experience with the method indicates 
that the complete tilling frequently brings out things which 
were not suspected at the outset. Furthermore, as the com- 
parison of the various sets goes on, one will frequently want to 
skip back and forth looking now at one set, now at another. In 
the course of such an analysis it is a nuisance to be stopped 
because some of the sets are not computed. The work involved 
in the computation will of course also run much quicker and 
smoother when all of it is done systematically at once, instead 
of being done piecemeal. 

As an example of the smoothness and steadiness with which 
the tilling work goes on I may mention that if six decimal places 
are carried, young assistants at the Institute will, when they 
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have become familiar with the method usually make something! 
between 3 and 7 wrong multiplications in the course of all the 
tilling work in a six variate problem (which, as will be seen 
from (16. 13) involves 1386 multiplications). And, of course, 
such mistakes are so immediately localised by the above 
mentioned checks that hardly any time is lost correcting them. 
The total tilling for a six variate problem carrying six decimal 
places usually takes about 13 hours, all checks and corrections 
included. 

For any number of variates the time needed can be estimated 
on the basis of the number of multiplications involved, and this 
estimate will as a rule be a very close one because the work 
consist nearly exclusively of mechanical product summations. 
The number of multiplications is as follows: There are 
^-rowed sets. Each of these contain elements that need 
to be computed, and each such computation involves (h — 1) 
multiplications. The total number of multiplications necessary 
in order to compute the elements are consequently 

n 

(15.10) I 0 (2) (*-!) = » (2) 2"-^ 

jfc = 2 

The checks on a A:-rowed table involves Ic^ multiplication which 
gives a total of 

(16.11) 2 0^' = ”{”+!) 2“-“-» 

k =2 

so that the grand total will be 

(16.12) n[(”)2“-3 + (n+l)2’‘-2-l} 

For n = 6 we get for instance 720 multiplications for the direct 
computations and 666 for the checks giving a grand total of 
1386. On the average we reckon about 100 multiplications of 
six decimal places per hour — checks, corrections and occas- 
ional rests included. The number 1386 therefore checks exactly 
with the above mentioned experience of the six-variate pro- 
blem. The values of (15. 12) for n= 1, 2 . . . are given in Table 

(15.13) 
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TABLE (16. 13). NUMBER OF MULTIPLICATIONS INVOLVED IN COM- 
PLETE TILLING (ALL CHECKS INCLUDED). 


Number of 
variates n 


Number of multiplications 
according to (15. 12). 


3 

30 

4 

124 

5 

435 

6 

1386 

7 

4137 

8 

11 768 

9 

32 247 

10 

85 750 

11 

222453 

12 

565236 


16. THE BUNCH MAP. SECTIONAL AND COMPLETE BUNCH MAPS. 

When the tilling is clone, the result should be exhibited 
graphically, otherwise it is difficult to get an ordered impres- 
sion of the mass of information which is made availablie. 

Consider a given v-dimensional subset («/? • . • y), and let i <j 
be two affixes in this set. If a regression equation is assumed 
to exist in the set, and if the equation is ordered in such a way 
that the variate No • i is expressed in terms of the variate No • ji, 
the equation will be of the form 


^ Si — d" • • • 

where and are the variates (assumed normalised) and 
. ..Y) ^ constant. This constant will assume different 
values according to which particular regression method we use. 
We consider specially the v values obtained for B by taking 
successively the v elementary regressions. The material for 
determining any such coefficient is present in the tilling tables; 
indeed, the coefficient B determined by the A-th elementary 
regression (the regression obtained by minimising in the direc- 
tion of the axis) is simply equal to 


7?(*) 

■Out, 


. . . y) — ■ 




,.Y) 




(16.2) 
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The numerator and denominator of (16. 2) are to be found in the 
tilling tables. 

For a moment let us disregard the supplementary terms in 

(16.1) and think only of the connection between and 
This connection may be represented by a straight line through 
origin in ^j) coordinates as 
indicated in Figure 8. 

For such a regression slope 
we get by (16. 2) v different 
determinations namely for 
h— § . . . y. It is therefore a 
natural idea to draw on one 
and the same chart all these v 
slopes, and see if they coincide 
fairly well. 

It is however not only the 
slopes that interest us; it may be 
useful to note also the abso- 
lute size of the numerator and 
denominator in the fraction (16. 2) which defines the slope. We 
sliall later be concerned with various conclusions drawn on the 
basis of the absolute size of the numerator and denominator. 
We therefore i)lot the point M whose abscissa (r and ordinate JB 
are determined by the denominator and numerator respectively 
in (16. 2). 

It will help further to give a clear picture of the situation 
if conventionally all the lines indicating the slopes are drawn 
from origin to one and the same side, say to the right (upwards 
or downwards as the case may be). This is also a simplifica- 
tion in the plotting work. The rule for the plotting will then 
simply be: Move towards the right on the horizontal axis a 
distance corresponding to the absolute value (regardless of 
sign) of the tilling elements that represents the denominator in 

( 16 . 2 ) , then move downwards if the two elements to the right 
in (16. 2) have the same sign and upwards if they have opposite 
signs, the distance to be moved vertically being equal to the 
absolute value of the tilling element in the numerator in (16. 2),. 
By convention we shall always let the variate No • measured 
along the vertical axis and the variate No • j in the horizontal 
axis when i < j. 

For each set (a, /? . . . y), and for each pair of the affixes (ij) 


Ic 
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in this set a lunch of slope lines may thtis be consfciruoted. If 
the set («,/?.../) consists oi v variates, the bunch will contain 
V beams, one for each variate. Each beam represents the result 
obtained by minimising in the direction of that particular va- 
riate. In the bunch representing the intercoefficient between 
the variates No • i and No ■ j, the two beams Nos • i and j are 
of particular interest. They will be called the leading learns of 
the bunch. 

The chart exhibiting all possible bunches in the big set 
(12 ... n) we call the lunch^map or more explicitly the complete 
bunch map for the set (12... w). The most convenient way to 
arrange it, if economy of space is not important, is to let all 
the individual bunches for a given set of variates be collectod 
in a row, the picture of the various bunches being displayed as 
cells in this row. 

If we do not find it necessary to investigate the slopes in all 
possible pairs of two variates, but are only interested in one 
particular such slope, for instance, the one between the variates 
Nos ■ 1 and 2, we may limit the map to these particular bunches. 
Such a map we may perhaps call a sectional bunch map. For 
such a map it is practical to use a somewhat different arrange- 
ment of the cells. Figures 13 and 14 of Section 26 is an example 
in 7 variates. 

17. THE TESTING OF A GIVEN INTERCOEFFIGIENT. USEFUL, SUPER- 
FLUOUS AND DETRIMENTAL VARIATES. THE STAR MAP. 

When a new variate is tentatively added to a previously 
considered set, there are three fundamental possibilities to be 
considered. The variate may be useful, superfluous or detrimental 
for the purpose of the analysis as it was formulated in the 
beginning of Section 14. This classification may be applied 
either with regard to the effect of a certain variate on a given 
intercoefficient, for instance the regression coefficient between 
the variates Nos. 1 and 2, or it may be applied with regard to 
the effect of the variate in question on the regression equation 
as a whole. In the present Section we shall discuss the notions 
useful, superfluous and detrimental from the view-point of a 
given intercoefficient. In the next Section we shall consider 
a given regression equation as a whole. 

A systematic study of the bunch map, and particularly of 
the change that takes place in a given bunch when we pass 
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from one set of variates to a more inclusive set, will furnish 
criteria that go a long way towards determining whether the 
various variates are useful, superfluous or detrimental. 

We first formulate the convention that the precisian of a 
given intercoefficient will be measured roughly by tho sprawling 
of the various beams in the bunch that represent the inter- 
coefficient in question. We may imagine that we construct the 
smallest sector that contains the most important of these beams. 
It is not always certain that all the beams of the bunch ought 
to be included in the sector measuring the precision; in 
certain cases one or more of the beams may by their excep- 
tional behaviour indicate facts which make it plausible not to 
let them influence the sector in question. The sector would 
then be determined by the general behaviour of the other 
beams. The two leading beams in a bunch must of course 
always be included in that set of beams which define the 
precision sector. 

This graphical definition of the precision of a bunch being 
adopted, one proceeds to the analysis of the bunch map, firist 
going through all the cells representing the (12) coefficient, 
then all the cells representing the (13) coefficients, etc. Under 
this survey each bunch is compared with the corresponding 
bunch in the first subsets of the set considered. This gives rise 
to the folloving characterisation of the variate added. 

A. A useful variate. If the bunch is tightened by the inclusion 
of the new variate and if the beam representing the new 
variate falls inside of the sector of the other beams, and further 
if the general direction of the bunch is changed, we conclude 
that the variate added is decidedly relevant. There is no doubt 
that it must be considered as useful for the determination of the 
slope in question. The behaviour of the (12) coefficient under 
the passage from the set (12) to (124) in Figure 11 of Section ,24 
has this property. 

The variate would still have to be considered useful, even if 
the general slope did not change, but if the tightness was 
sharply improved and the beam of the new variate falls inside 
as for instance in the case of the (12) coefficient going from (12) 
to (123). (Fig. 11. Section 24). Both in this case and the preceding 
one has the new variate contributed essentially to the im- 
provement of the fit. The only difference is that the net regres- 
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Sion in the latter case happened to turn out about the same as 
the gross slope. 

Even if the beam of the new variate should fall more or less 
outside the sector containing the other beams, the new variate 
would have to be considered as useful provided the bunch in 
general is definitely tightened, or its slope definitely changed. 
The examples later to be discussed will just show that if the 
inclusion of the new variate does change the slope markedly, 
there is a chance that the new beam will fall outside, parti- 
cularly if it is not a long beam. 

The new variate may be useful although the bunch becomes 
somewhat more open, namely if the general slope of the sector 
changes so definitely that, even taking account of the poorer 
precision, one gets a clear impression that the new slope is 
significantly different from the old. For instance, if it is a 


Regression slope between variate nos. 1 and 2. 



question of determining the intercoefficient between the 
variates Nos. 1 and 2, and we have a situation as in Figure 
9, there can be no doubt that the set (II) must be preferred to 
the set (I)j 

If we have a situation as exhibited in’ Figure 9, we can say 
that it is regrettable that we need to consider a set (II) where 
the precision is so much poorer, as in (I), but if we are looking 
for the net intercoefficient between the variates Nos. 1 and 2, 
we have to run the risk that is connected with using the 
unprecise result in the set (II). This is just an example of a 
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situation where the scattei'ances or regression spreads or the 
line coefficients are not conclusive. For instance, if a number 
of the bunches for the other intercoefficients in the regression; 
equation behave in a similar way, the regression spread and 
the line coefficient would increase when we go from the set 
(I) to the set (II). 

B. A superfluous variate. The criteria of a superfluous variate 
are: (1) The bunch does not tighten by the inclusion of the 
new variate, (2) the general slope of the bunch does not change 
distinctly (or more specifically each of the beams in the bunch 
j^emain unchanged), (3) the beam of the new variate fallal 
outside the sector of the other beams, (4) the beam of the new 
variate is much shorter than the other beams in the new bunch, 
(5) the beams of the other variates are not appreciably shorten- 
ed by the inclusion of the new variate. 

If all these criteria are simultaneously fulfilled, the variate 
in question is decidedly superfluous. The variate must however 
be considered as superfluous even if only some of the above 
criteria are fulfilled, particular importance must then be 
attached to the criteria (1) — (3). 

As a rule there will in practice be a fairly good agreement 
between the above criteria, the only important exception being 
that by pure chance the beam of a superfluous variate may fall 
inside the general sector of the other variates. 

If a variate is found superfluous by the above criteria, the 
conclusion should be checked by considering also the zero slope 
criterion. By this I mean the following: Let {ij) be the bunch 
considered (that is we considered the regression slope between 
the variates Nos, i and j). Let («/? ... y) be tlie old set and 
(a^ . . . ylc) the new, that is, the variate added is No. Jc. To 
check the conclusion of the superfluity of No ■ Tc we look at the 
general slope of the bunches (ik) and (jh) (and possible other 
intercoefficients involving k) in the set (a/? . . . pk). If these are 
degenerate, that is to say near to zero when k is the horizontal 
axis, or near infinity if k is the vertical axis, then this is an 
additional indication of the superfluity of the variate No. k. A 
still further confirmation of this conclusion will it be if the 
ifc-beam in the {ik) and (jk) bunches (and possibly in other bunches 
in the set (a/J . . . pk)) fall outside the sector of the other beams. 

The reasons for the criteria (1) — (3) are obvious, and tb^ 
criteria (4) and (5) regarding the lenghts of the beams are 
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derived by the following considerations. In any given element- 
ary regression equation (in normalised variates) one of the 
coefficients — namely the coefficient of that variate in whose 
direction the minimalisation is done — will be a scatterance 
(when the equation is written in the homogeneous form and its 
coefficients determined by the elements of the adjoint correla- 
tion matrix). The size of this regression coefficient will 
determine the general level of the other coefficients in the 
equation. A given beam in a bunch in the slope map will, 
therefore, in general, be all the shorter the more perfectly the 
rest of the variates in the set is linearly dependent. A glance 
at a given bunch will consequently immediately give an 
impression of the relative importanoe (for the intercoefficient 
considered) of the several variates in the set. They can he ar- 
ranged in a descending order of magnitude according to the lenghts 
of the beams in the hunch. Those variates which have the longest 
beams are the more important. The fact that a variate has a 
beam that is very short as compared with the other beams in 
the set is therefore an additional criterion of the superfluity 
of this variate. (As an example see the bunch (123456) in 
Figure 1 of Section 26 and ^e comments attached to this 
bunch). 

Further, the shortening of a beam as we pass from a given 
set to a more inclusive one will be all the sharper the more 
perfectly the rest of the variates in the new set are linearly 
connected, and the poorer oollinearity there is in the rest of the 
variates in the old set. 

This consideration of the lengths of the beams is essentially 
an analysis of the same type as the study of scatterances and 
minimal roots. Both analyses involve about the same sort of 
information. The length of the beams is, however, only one 
aspect of the bunch analysis. Indeed, here we combine the 
study of the length of the beams with a study of their direc- 
tions and with a study of the tightness of the bunch. 

G. A detrimental variate. Finally, if the bunch explodes by the 
inclusion of the new variate, that is to say, if it becomes much 
less tight than before, the new variate must be considered as 
detrimental. As typical examples we may consider the be- 
haviour of the (12), (13) and (23) bunches when we pass from 
the set (123) to (1234) in Figure 1 of Section 24. 

The new variate must be considered as detrimental even 
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though there is, strictly speaking, no explosion, but only such 
a distinct decrease in the tightness and so little change in the 
general slope of the bunch that we cannot say definitely 
whether the new slope is different from the old or not. 

When one gets accustomed to working with the bunch map 
so that one really understands ’’the language it speaks”, one 
will discover that it is a veritable gold mine of information and 
a most powerful tool of analysis. 

This classification of the variates as useful, superfluous and 
detrimental may be done for each intercoefficient and for the 
passage from any variate sets to its supersets. In order to keep 
track of the large number of cases that thus arise it is conl- 
venient to condense the conclusions in some sort of graphical 
representation. I have found the ’’star” map exhibited in the 
example of Section 24 very convenient. Here an asterisk 
indicates a useful variate, an empty circle indicates a super- 
fluous variate and a blackball a detrimental variate. For in- 
stance, the asterisk on the third line in the horizontal section 
(234) and in the column (24) indicates that when the purpose 
is to determine the net intercoefficient between the variates 
Nos. 2 and 4 then it is correct to add the variate No. 3 to the 
set (24). 

The above discussion refers to a comparison between a 
given set and its supersets, or between a given set and its sub- 
sets. It is however also of interest to compare sets on the same 
dimensionality level. Such a comparison may be made with 
regard to the behavior of the bunch of any given inter- 
coefficient. A few examples will illustrate the situations which 
may here arise. 

Suppose that we consider the bunch of the (12) intercoeffic- 
ient, this bunch being very tight in the set (123) and also very 
tight in the set (124). Further suppose that the general slope of 
the (12) connection is markedly different in these two sets. This 
would be a strong indication that there exists good linear re- 
lations in both the three-sets considered, and further that this 
is not the same relation. In other words, none of the two 
measurements are biassed by our failure to take account of 
some variate. The difference is simply due to the fact that we 
are measuring two different things when we determine the 
general slope of the (12) bunch in the set (123) and in the set 
(124). In this case I shall say that the change in the general 
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slope of the (12) bunch as we pass from one to the other of the 
two sets (123) and (124) is a multilinear effect. Obviously in 
this case the big set that contains both the three-sets con- 
sidered, namely (1234), is multiply collinear, and a regression 
equation in (1234) would therefore have no meaning. 

On the other hand it may happen that the change in slope as 
we pass from (123) to (124) is due to the fact that both these 
slopes are biassed, the one in (123), because 4 is not taken 
account of, and the one in (124) because 3 is not taken account 
of. In this case I shall say that the change in slope is a gross 
slope effect. This is just the case where the correct solution is 
to unite the sets to a bigger set, namely (1234) and consider 
the regression here. 

Which one of the two alternatives we have will be express- 
ed by the behavior of the (12) bunch in the big set. If it ex- 
plodes, we may take it as a sign that we have a multilinear 
effect, but if it tightens still more, we may conclude that we 
have a gross slope effect. 

There are also other cases. Suppose for instance that there 
exists a good structural relation in (12345) and another in 
(12346). It would then have a meaning to consider a regress- 
ion in the set (12345), and it would also have a meaning to 
consider one in the set (12346), but it would have no meaning 
to consider one in (123456). Suppose now that we consider 
statistically the two sets (123) and (124) and find that the (12) 
slope changes markedly as we pass from one to the other of 
these three-sets. This is of course in a sense a gross slope 
effect since in one case we have failed to take account of 3 and 
in the other we have failed to take account of 4, and both these 
variates are necessary to obtain a good regression equation. 
But which ’’true” coefficient is it of which the two observed re- 
sults can be said to represent biassed measurements? In point 
of principle the question has no definite answer, because we 
may conceive of it either as the (12) slope in (12345) or in 
(12346). Both (123) and (124) are indeed contained in both 
these five-sets. In actual fact the (12) coefficient in (123) may 
be nearest to the ’’true” (12) coefficient in (12345) while 
that in (124) may be nearest to the ’’true” coefficient in (12346), 
or the (12) coefficient observed in both three-sets may be 
nearest to the ’’true” coefficient in (12345) etc. Which alternative 
we shall have, will depend on the amount of intercorreJation 
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that happened to be present between the various variates in the 
data at hand. We may refer to this as the mixed case. 

If we got the idea of introducing also 5 and 6 amongst the 
variates tested, we would by the bunch technique, in all pro- 
bability be guided towards the right conclusion, namely that 
(12345) and (12346) form two independent collinear sets. 

18 . THE TESTING OF THE VARIATE-SETS. CLOSED SETS AND 
ADMISSIBLE REGRESSIONS. 

ThO star- map is in a sense the ’’difference” map of the bunch 
map. It expresses what happens each time we add a variate, 
while the bunch map expresses the situation that exists after 
the variate is added. By combining the criteria contained in 
these two maps we shall now discuss the significance of the 
various regressions — each regression taken as a whole — 
and thereby try to get an idea of the confluence hierarcy that 
exists in the variates. To a large extent this will consist in 
checking wether there is agreement between the conclusions 
reached by studying the individual intercoefficients. 

It is not necessary — nor indeed possible — here to give a 
complete account of all possible cases that may arise. It will 
be sufficient to indicate the most important cases and the 
principles by which they ai'o classified. One who wants to 
apply the method, will then no doubt himself be able to work 
out the detailed interpretation of the cases which he encounters. 

I. A closed set. If we have a set, as, for instance, (234) in 
the example of Sections 23 and 24, where the star-map displays 
only asterisks, and where the slope map indicates a high degree 
of tightness all the way through, then we may conclude that 
all the vai'iates belong in the set, and when these variates are 
taken together, they give a good fit to a linear relationship. 
All the coefficients in the regression equation in this set may 
be considered significant. Such a set will be called closed and 
the regression equation in this set an admissible regression 
equation. If there can be found two (or more sets) that appear 
as closed according to this criterion, and if the bigger set 
obtained by uniting these two sets show explosions in the 
various coefficients, in other words, if this bigger set has the 
features mentioned under IV below, then we may consider this 
as a check on the conclusion that the lower sets considered 
were closed. 
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II. A promising hut not complete set. If we have a set where 
the star map shows asterisks throughout but where the bunch 
map does not yet indicate sufficient tightness in the various 
bunches, we conclude that all the variates included so far are 
significant, and should stay in, but we should continue to search 
for some new variate that may be able to improve the fit. 

III. A promising set hut with some extraneous vanate{-s). If there 
are some circles in the set, and particularly if several of these 
consist in pointing out the same variate as superfluous, and still 
more if this indication is checked by the additional zero slope 
criterium mentioned in Section 17, then the variate (or variates), 
in question may be left out. Some caution ought however 
to be taken, because it is conceivable that, by including some 
entirely new variate (or variates) not yet considered, the 
situation may be changed in such a way that the variate that 
first appeared as superfluous now proves to be somewhat 
useful. To be more precise: The inclusion of the entirely new 
variate (or variates) may so clarify the situation that it becomes 
possible to look for finer traits of the regression equations. And 
these finer traits may indicate that the variate that was first 
’’circled” exerts, after all, some influence. We may then 
interpret the circle first put on this variate to mean that it was 
superfluous as compared with one (or more) other variates which 
it was more important to take into account, 

IV. An inadmissible set with closed subsets. If we have a set. 
as for instance (1234) in the example of Section 24, which in 
the star map is represented exclusively by blackballs, the set 
must be interpreted as multiply collinear. This conclusion is 
supported if we not only have a star map that indicates black- 
balls, but if the tightness in the total set — as shown by the 
bunch map — is actually very poor. This is the case in the 
example considered. The beams in the various bunches in the 
set (1234) are indeed sprawling excessively. And the conclus- 
ion is still more definite if the bunches in the subsets were 
actually quite tight so that we have clear cut explosions for all 
the bunches as we pass from the subsets to the total set. This 
is actually so in the example considered. 

Of course, if we find a set that shows such a definite black- 
ball picture, we need not interpret this as a deplorable result. 
If the tightness of the bunches in the subsets of the black- 
ball-set are fairly good, the blackball-situation found may be 
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interpreted as indicating* that the investigation is now carried to 
completion. It means that the law of variation in the subseits 
furnishes already all the information that can be got out of the 
observed product moments. In other words the cycles were 
already closed before we put all the variates together into one 
total set. 

V. An inadmisdhle set with unfinished subsets. If the tightness 
in the bunches of the subsets is not satisfactory, the blackball- 
situation found in the total set indicates that the thing needed 
in order to obtain a better fit is not to take account of the fact 
that all the variates considered may change simultaneously and 
independently, hut to look for some entirely new variate. And this 
new variate must again in turn be judged by the asterisk, circle 
and blackball criterium. 

The application of these principles will be illustrated by the 
examples in Sections 23 — 29. 

19. MEAN VALUES AND BOUNDARY VALUES OF REGRESSION COEF- 
FICIENTS. THE SIGNIFICANCE FACTOR 

When we have settled the question of which one (or ones) of 
the regression equations are to be considered as admissible, we 
want to indicate limits of significance for the regression coeffic- 
ients within each such equation. On the basis of the discussion 
in Sections 14 and 17 we adopt heuristically the two opposite 
elementary regression slopes as probable boundaries for the 
regression coefficient in question. This means that if the 
regression equation is written in the form (16. 1) we adopt as 
boundaries lor the numbers 

(19. 1 ) _ and - 

... 7 ) .. . 7 ) 

These boundaries are invariant for a permutation of the two 
variates. This is seen simply by noticing that if we interchange 
i and j in (19. 1), each limit becomes equal to the reciprocal oif 
what the other limit was originally. That is to say if we write 
the regression equation in the form 

(19. 2), . . y) • 

tlie boundaries for yj will be the reciprocals of the boundaries 

for yy 
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This suggests that it is more natural to indicate the signifi- 
cance of the regression coefficients by a factor of uncertainty 
than by an additive term (as we do when we indicate in the 
usual way the standard error of the regression coefficients). 
In view of this fact, it seems natural to adopt as the mean re- 
gression coefficient the geometric average between the absolute 
values of the two boundaries, and then determine the sign by 
means of gives 


(19.3) 





■ ■ y) 


as the mean regression coefficient, and 


(19.4) 
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Y) 
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as the factor of significance. 

The coefficient (19. 3) is of course nothing but the diagonal 
mean regression coefficient between the variates Nos. t and j. 
Furthermore, the factor S is nothing but the absolute value of 
the familiar partial correlation coefficient. The use that is here 
made of it, namely as the significance factor (the "multiplica- 
tive standard error") of a diagonal regression coefficient is 
however not usual. Since O is the absolute value of a correla- 
tion coefficient, it must of course lie between 0 and 1 . 

It should be noticed that the significance limit O only applies 
to regression coefficients in equations that have been recognised 
as admissible by the whole bunch map and star map technique; <9 
should not be taken as a criterion that is sufficient in itself. 

In numerical work we shall add the significance factor in 
the same way as one usually adds the standard errors, putting, 
however, the sign •/' between the two figures instead of ±, so 
as to indicate that it is here a question of "multiplication or 
division", not a question of "addition or subtraction". The dots 
in /- may mnomo technically be interpreted as the multiplication 
sign, and the fractional bar as the division sign. In the 
numerical examples of Sections 23—29 this notation is used. 

In any given admissible subset {a ^ the intercoefficient 
between any pair of two variates is defined by (19. 3). If the 
equation as such in this set shall have a meaning, the various 
intercoefficients ought to be compatible, for instance JB 32 
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ought to be equal to minus B ^2 divided by ^^3, etc. Quite 
generally we ought to have 

(19.6), 

This condition is fulfilled if the interooefficients are defined by 
(19. 3), provided only that the signs of the rows of the adjoint 
correlation matrix are compatible. If the signs are not 
compatible, an inspection of the bunch map will in any case tell 
which sign to use. If there is such a high degree of disorgani- 
sation that the sign is not distinctly defined by the bunch map, 
we have a set of variates that ought not to liave passed 
through the severe tests of the bunch map and the star-map 
analysis. 

Of course I have here made no attempt to specify exactly 
what the probability is that the ’’true” regression coefficient 
shall fall outside the significance limits indicated by 6>. For 
the reasons previously indicated I prefer to rely on the 
purposely vague statement that it is ’’very improbable” that 
the ’’true” coefficient will fall outside of these limits. 

20. SOME TEST-PARAMETERS THAT MAY BE USED AS SECONDARY 
CRITERIA. 

The complete analysis by the bunch map and star map is the 
ultimate test by which to judge empirically the hierarchy of 
linear confluency. If ever there is a conflict between the con- 
clusion indicated by the maps and by some other empirical cri- 
teria, for instance some of the test parameters discussed in Part 
I, the result reached by the complete map study is to be 
considered as the final word. 

This does not prevent certain test-parameters to be of some 
use as secondary criteria. We have already seen that when the 
slope map and star map analysis have decided about the 
admissible regressions, the partial correlation coefficients may 
be used to indicate significance limits for the coefficients. If 
these limits are computed for all possible intercoefficients and 
tabulated for all possible subsets in tables of the same arrange- 
ments as the tilling tables, we even get an approximate ideia 
of one feature of the bunches, namely the angle between the 
two leading beams in each bunch; obviously the partial correla- 
tion coefficient is an expression for this angle. This is, however, 
only one of the many features that are actually taken account 
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of in the complete bunch map and stai' map analysis, so that no 
discussion of partial correlation coefficients, even if made in 
this exhaustive way, can replace tlio map analysis. There is 
also the practical consideration that it takes at least as much 
time to compute the complete system of partical correlation 
coefficients (on the basis of the tilling result) as to plot the 
bunch map. In practice one will therefore hardly ever find it 
worth while to compute the complete system of partial cor- 
relations. The only parameter system which it is always 
convenient to compute completely in practice is that involved 
in the tilling tables. 

While the partial correlation coefficients — or which in 
practice amounts to the same, the factors Q defined by (194.) 
— express the sprawling between the two leading beams in a 
bunch, the corresponding coefficients, formed by the i2’s of 
Section 6, namely 

( 20 . 1 ) = + 

V il'ii -ttjj 

is a composite expression for the sprawling of all the beams in 
the bunch considered. In doubtful cases when it is difficult to 
decide by the visual inspection of the slope map whether a 
certain variate is useful or superfluous, or whether an equation 
should be taken as admissible or not, it may be found wortii 
while to compute the parameter U us a suj^plemenlary indication. 

A coefficient tliat is influenced by a still larger number of 
bunch characteristics is the line coefficient defined by (6. 1); 
it expresses tlie sprawling of all the bunches in a given set of 
variates. 

While the partial correlation coefficients — or which amounts 
the same, the factor 8 — as suggested above, may be of some 
use in the study of linear confluency, the multiple correlation 
coefficient is in my opinion of no use, or rather it is very 
misleading and dangerous parameters. Examples of this are 
mentioned in Section 1. 

21. COMPATIBILITY SMOOTHING OF REGRESSION COEFFICIENTS 
IN OVERLAPPING SUBSETS. THE METHOD OF DIAGONAL ZEROS. 

Suppose that the investigation has led up to the conclusion 
that there is a certain set (12 . . . n) which is twofold collinear, 
that is to say any of the n subsets obtained by leaving out one 
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variate at a time is a closed set possessing a significant 
regression equation. If this is so, the coefficients of these n 
regessions ought to be compatible in the sense that any of these 
equations is deducible from any two of the other equations (by 
eliminating between them the variate that is lacking in the 
equation which it is wanted to deduce). If the regression equa- 
tion within each of the subsets are determined by the diagonal 
regression method (or by any other empirical method), the 
coefficients in the set of n equations so obtained may however, 
not have this property exactly. The problem therefore arises 
to smooth the coefficients in such a particular way that the 
property in question is assured. 

A similar but ^more general problem arises if the investiga- 
tion has led to a certain set (12 .,.n) that is threefold collinear 
or more. In this general case wo may formulate the problem 
thus: There is determined empirically an N rowed and n column- 
ed matrix of regression coefficients 

II All • • • ^Xn 

(21.1) IM«I|= 

II ^N\ • • • ^Nn 


This matrix defines N equations of the form 

( 21 . 2 ) = 

The coefficients shall bo smoothed in such a way that the 
matrix (21. 1) becomes of rank x. 

The fact that (21. 1) is of rank x means that x of the equations 
(21. 2) are independent, which in turn means that the scatter in 
(^1 • • • ^«) bas lost X dimensions, in other words, the scatter is 
left with an unfolding capacity of w— x. 

The case mentioned in the beginning of this Section is the 
case X = 2, iV= w. I shall first indicate a rapid method of 
compatibility smoothing applicable to this special case, and in 
the next Section I shall give a method applicable to the 
general case. 

In the case y, — 2^N~n we choose such a numbering of the 
equations that the equation No. K is the one where the variate 
No • K is lacking. In other words all the diagonal elements in 
(21. 1) are zero. 

Let us first normalise the coefficients so as to make them 
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comparable in size. A rapid and convenient way to do this is 
by means of the absolute- value norms. This means that w© 
determine the sum of the absolute values of the elements in 
each row in (21. 1), and then divide each element by the 
absolute row sum in the row to which the element belongs. 
We thus obtain a new matrix 


(21.3) 





where all the absolute row sums are equal to unity. This 
property is used as a check on the normalisation. For the 
purpose of checks on the other computations to be considered it 
is convenient to compute also the natural row sums of (21. 3). 

Consider one particular of the rows in (21. 3), say No K. 
Let Nos. P and Q be two other rows. Multiplying the row P 
by some constant C and the row § by a constant D and adding, 
we ought to get a new row, whose elements 


(21.4) Cuj^'i-DaQi 

are proportional to those of the row K, namely If the 
matrix (21. 3) was exactly of rank 2, there would in general be 
one definite ratio between C and D which we would have to 
select in order that (21.4) should be proportional to axi. If the 
matrix (21. 4) is not exactly of rank x, it will in general not be 
possible to choose the ratio between C and D so as to obtain 
exact proportionality; C and D can only be chosen so as to 
obtain the ’’best possible” fit. This means that some more or 
less plausible principle must be adopted for the choice of G and 
D. In the present case where all the diagonal elements a^x ^re 
zero it seems plausible to choose C and D in sucli a way that 
the diagonal zero is maintained. This leads to 


(21.5) 


C/P — — aqx! O'px 


Hence the elements of the new row No • K must be proportional to 


( 21 . 6 ) ^QK^Pi — ^PK ^Qi 

In practical computation it is convenient to put the new row 
either equal to plus the expression (21.6) or equal to minus 
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this expression, the sign to be chosen so that the elements in 
the new row gets the same signs as the old. Of course we 
assume that the signs of (21. 6) for varying i will be compatible 
with the signs otlierwise the incompatibility between the 

equations treated would be so great that the attempt at 
reconciling them should be given up and the admissibility 
criteria for the regressions should be reconsidered. 

For each given JST, the row numbers P and Q in (21.6) may 
be chosen in different ways. This means that we obtain 

(Afferent rows which can be compared with the row 
No • K. To make the comparison easier we reduce all these inew 
rows to absolute row sum unity. Tn other words, we form 


(21. 7) ~ 0) {dqn O' Pi — 


where (o is a constant so chosen that the absolute row sum 
becomes unity and the signs ol (21. 7) coincide with those of 
Of^P If the matrix (21. 3) is of rank 2, we shall have exactly 
~ ^Ki for any set {PQ). The distribution of the values of 
around for all possible sets [PQ) is characteristic for the 
degree to which the matrix falls short of having the property 
which it is the purpose of the smoothing to establish. All the 
values (21. 7) should therefore be tabulated in the way in- 
dicated in the example of Section 25. In works of this kind the 
natural row sums of (21. 3) and of the derived matrices should 
be used throughout for checking purposes. 

Let 


( 21 . 8 ) 


^Ki 



be the average of all the coefficients which are to be compared 
with aj^p If is not exactly equal to Oj^i a compromise must 
be made; a^i must be somewhat modified on the basis of the 
information contained in the other regression equations. A 
plausible solution seems to be to adopt as the smoothed coeffi- 
cient the simple average between, ajpi and 'Zku is, the smoo- 
thed coefficient is put equal to 


(21.9) 


^Ki + ^Ki 
2 


If the matrix does not yet come close to being of rank 2, 
the whole process may be iterated. Since the matrix 'aK^ 
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has absolute row sums equal to unity, and and always 
have the same sign, the matrix must have absolute row 
sums equal to unity, it is therefore already in shape to be taken 
as the starting point for a new smoothing. 

The example discussed in Section 25 shows that this process 
converges with an extraordinary rapidity. Already the second 
smoothing according to this method assured exact compatibility 
in the first 10 decimal places. The discussion of that example 
even seems to indicate that the simple arithmetic average 
chosen in the formula (21. 9) represents in a sense an optimum 
choice. It seems to be the linear combination between and 
aKi that will produce the most rapid convergency possible. 

22. COMPATIBILITY SMOOTHING CONTINUED. A GENERAL METHOD. 

The method of compatibility smoothing described in Section 
21 has a natural application only in tl>e case x = 2, 
where the is a square matrix with zeros in the diagonal. 
Indeed, these zeros just served to define the parameters C and I). 
In the general case some other principle must be relied upon. 

It’ tlie matrix is of rank exactly x it should be possible 
to express the row No JST as a linear combination of x other 
rows, in other words we should have, for any i 

( 22 . 1 ) 

where P, (^ . . . i? is a set of x different numbers chosen in the 
sequence 1, 2 ... A, and the C's are coefficients independent of 
How shall the coefficients C be selected in order to assure 
the best possible fit? 

A natural idea seems to be simply to determine them as the 
regression coefficients of on i being the 

variable parameter that defines the various ’’observations”. 
This leads to the following procedure. 

Let the matrix (21. 1) now be reduced by the square row 
norms, in other words put 

(22. 2) = Aj^i j + . . . + 

so that 

(22.3) a\^ + . . . +aL= 1 for all K 

Then form the row moments 
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(22. 4) II UK — 

(in particular by (22. I^kk= 1)» compute (most conveniently 
by the tilling technique) the adjoint elements in all (x + 1) 
rowed subsets 

(22.5) M'kh{uv...w) 

(UV ... . W) being (x + 1) numbers chosen in the sequence 1, 2. . . 
and the adjunction being made in this set. With this notation 
tlie coefficient which is obtained via the equations 

{PQ. . . B), and is to be compared with Uki, is defined by 

(22. 6) f-^KK^Ki'Q ,..«)■ — Z f^KHiKPQ ...«)* ^Hi 

where H runs through all the x numbers P, Q ... R (but not K\ 
and ju are the adjoints taken in the (x + 1) rowed sQi (KPQ .. . R). 
There is no need to normalise the magnitudes de- 

termined by (22. 6) since by this formula they have already 
been fitted to Uki both in sign and order of magnitude. 

For each K there will by (22. 6) be obtained different 

values to compare with aui- In the case of x > 2 it will be a 
little troublesome to tabulate all these values, as we did in thei 
method of Section 21. If it is not particularly wanted to see 
each value, the tabulation may, however, now be omitted because 
the present method is such that the averaging may be done in 
the algebra of the formula. In order to do this we first mahe 
the convention that (22.5) shall be interpreted as zero when- 
ever one, or both, the affixes {HK) lack in the set (CIF. . . W). 
With this convention the summation in the right member of 

(22. 6) may be interpreted to run through all the values 1, 2 . . . iV' 
except K. This is a formal advantage when we want to per- 
form an averaging over all possible sets (PQ . . . R). The average 
of the magnitudes (22,6) using the /^^KKiKPQ. . . r) weights, 
will now be 

(22.7) ^HKII)^Hih^iKK) 

H=\, 2 . . . ) Ki . . . N 

where 

(22.8) = ^PQ . . .R ^^KHih'PQ ... A) 



117 


the summation in (22. 8) running over combinations without repeti- 
tion of the X affixes (PQ . . . P). This is obviously the same as 
^f^KHiuv . IT) where the (x+1) affixes {UV . . . W) run through 
those special combinations that contain K. By the above 
convention this is however in turn the sameas if (C/T. , . IT) 
runs through all possible combinations, no matter whether K 
is present or not. We therefore have 

(22. 9) — ^VV . . . W f^KJHU V . . . H^) 

where {KH) are any two affixes (equal or unequal) in the set 
I, 2 ... N and denotes a summation over combinationvS 

without repetition of the (x + 1) affixes UV . . . W picked in the 
set 1, 2...iV. Thus the jNT-rowed square matrix is simply 
formed by adding all the (KH) elements that occur in the (x + 1) 
rowed tilling tables for matrix may be formed once 

for all and applied to all the averagings (22.7). When the 
matrix M(A'/r)* is formed, each of the magnitudes that is to be 
compared with is formed as a linear compound, namely 
(22. 7) of all the (N — 1) coefficients where H^^K. In order 
to omit the term H—Km the summation (22.7) and get the 
proper sign it will in practice be most convenient to make a 
complete square table of the matrix — V{kii) (which, in- 
cidentally, is symmetric) divide by and replace all the 

diagonal elements by zeros. The quantities are then simply 
formed by taking the product sum of a column in this modified 
matrix and a column in the original a matrix. 

Since by the computation of the accumulated matrix (22. 9) 
each element in the (x + 1)- rowed tilling tables is used once, 
and only once, various forms of checks may be applied. We 
may for instance verify that: 

The sum of the diagonal elements in is equal 

to the sum of the diagonal elements in all the (x 4* 1) 
rowed tilling tables, and similarly for the north-east 
triangle. 

Using the same argument as in Section 21, we put the 
smoothed value of equal to 


( 22 . 10 ) 
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( 22 . 11 ) 


0>Ki — 


^Ki + ^Ki 
2 


If necessary the values may be taken as the starting 
point for a second smoothing, etc. 

Instead of adopting the simple arithmetic average (22. 11) 
sans faqon, one may go to the following more elaborate pro- 
cedure. Let us put 

( 22 . 12 ) 

where 

(22. 13) ~ ^A'i 


and A is a parameter to be determined in such a way that the 
smoothing comes the nearest possible to making the matrix 
of rank x. Since a common factor of proportionality is of no 
avail for the smoothing of the matrix considered, (22. 12) 
represents the most general form of lineai’ smoothing that can 
be based on the two elements and 
The moments of the new matrix will be 

^/r// — ^Ki ^ni ~ ^Ki ~ d" + A^ 


Hence 


(22. 14) f^KH— yKii 

where is the moment matrix of the original coefficients, 
defined by (22. 4), and 

(22. 15) vj[ji — Zi + ^m^Ki 

(22. 16) = Zj 

As a check on (22. 15) and (22. 16) we have 

(22. 17) ^00 = 2Zf «oi ^oi 

(22. 18) ^00 = Zf Cof 

where and % are column sums and and /qq grand totals. 
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All the three matrices fi, v, y are obviously symmetric. ' 

Since the elements of the new moment matrix are of the 
form (22. 14), we are led to consider the A^-rowed determinant 

(22. 19) F{X) = j /tto/ — ^Vj(i£ + y ku j 

and its various principal minors. Let [F — l)-rowed 

principal minor obtained from (22. 19) by omitting the row and 
column No K, the (N — 2)-rowcd principal minor obtained by 
omitting the rows Nos. AT and H and the columns Nos. K and i7, 
etc. All these determinants are obviously positive definite for 
any real value of X because they are moment determinants no 
matter what value we put for X. This shows that any value of 
X which makes one of the principal minors of (22. 19) vanish, 
must also make all those principal minors vanish, which contain 
the first minor. 

In order that the new coefficient matrix a shall be of rank 
X, it is necessary and sufficient that the moment matrix is 
of rank x which means that the determinant (22. 19) and all 
its principal minors down to and including the (x+l)-rowed 
must vanish. In general it will not be possible to ensure this 
just by disposing of the single parameter 1, but in practice if 
the original coefficient matrix a was near to being of rank x, 
it may be possible to select a value of X that will realise very 
closely the vanishing of the principle minors in question. Since 
(22. 19) is positive definite for all values of X, the shape of the 
function F(X) and its principal minors will be of the kind 
exhibited in Figure 10. 

None of the functions F can 
pass zero; since they are positive 
definite for all X, they can at 
most touch zero. And in a point 
where any of the Fj^^ touches 
zero, F must also do so. 

If all the functions Fjjy jy 
of the order (x + 1) vanish in 
points that lie tightly together in a group, then we would obtain 
a good solution by putting X equal to some central value in the 
group. In Section 25 we shall see a numerical example where 
these points lie extremely close. 

If several such groups should exist, we are of course inter* 



Fig. 10. 
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ested only in tho one with smallest X; we see indeed by (22.12) 
that the correction to be applied to a^i is all the smaller the 
smaller X. 

Thus from the point of view of numerical computation the 
problem is to plot the minors in question Fjjy yy over a 
certain range near 1 = 0, and locate their zeros in this range. 

In order to do this we need to develop functions of the kind 
(22.19) as polynomials in X. For iV=2 we find 

(22. 20) I ~ "b 

+ (/^i2V2i + >'12/^21 - /ill ^22-^1^22)^ + 

+ (ni/^22 + ^U '^22 + /iliy22“yi2/i2l”->^21»'l2“/il2y2l)^® + 

+ (^12 ^21 + ’^21^12 ” yil ’'22-^1^22)^® + 

+ (^11^22 ~ ri2y2i)^^- 
and for iV' = 3 we get 

I f^HK ~ I = ~~ ^ 

(22.2 I ) ~ (S^,y + S^J r + {S^,yy + S ,,y) X^ 

— ^yyy d" ^yyy 

Here S^^,y denotes the sum of all the determinants that can 
be formed by taking in all possible ways one column from the 
matrix /i, one from v and one from y, denotes the similar 
sum when two columns are taken from /i and one from v, etc. 
When til, V and y are written as affixes, they may be inter- 
preted to have the weights 0, 1 and 2 respectively. The sum 
of the weights in a given term in (22.21) is equal to the 
exponent of 1, and the term consists of all possible S that have 
subscripts with total weight sum equal to the exponent of X. 

If the S in (22. 21) are written out explicitly we get 

(22.22) I 

4 - \fiyfi\ + 

^;ivv=|^>'y| 4 - \vfiv \ 4 - I I 
^fAf^y=\f*f,y\ 4 - Ur/u| 4 - \yiAfA\ 



^/uyy=z \ f,ry\ + l^y^l 4 - \vfiy\ + | vy^| 4- I I + I I 

^vvy = I vyy I 4- I vyv | + j yw j 

^t*yy=\fiyy\ 4 - \yny\ + \yYu\ 
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Here | fxfiiv | denotes the determinant where the first and second 
columns are taken from the matrix At, and the third column 
from the matric v, and similarly for the other symbols in 
(22. 22). The rules indicated above are general and applies to 
any N. 

In practice the computations are not so elaborate as they 
may appear from this theoretical analysis; indeed, in practice 
the higher terms will frequently vanish nearly exactly so that 
we actually need to work only with polynomials of rather low 
order. The example of Section 25 shows this. This example 
also turns out to lead nearly exactly to the simple arithmetic 
average which we put up heuristically in formula (22. 11). 


PART IV. APPLICATIONS. 

23. A CONSTRUCTED CONFLUENGY EXAMPLE IN 5 VARIATES. 

To test the various procedures suggested in the preceding 
Sections and compare their relative merits we shall analyse a 
constructed numerical example and some examples drawn from 
actual economic data. We begin with the constructed example. 

Consider four variates . . .x,^ whose values in each observation 
point are determined by (13.1) where the y's are variates 
determined by random drawings. In the example each 
observation was determined by the average of end digits in 100 
consecutive drawings in the Norwegian State Lottery. All the 
individual observations of a given as well as all the various 
yi were independent in the sense that drawings that were used 
to determine a certain observation of a given were not used 
for any other purpose, that is neither for other observations of 
this yi nor for observations of any of the other yj. 

The Norwegian lottery drawings are done without putting the 
numbers drawn back into the um. The probability for a given 
end digit in the various drawings is therefore not exactly con- 
stant, but the effect is so slight that it would be without any 
influence on the calculations in the example. 

As a fifth observational variate was introduced x^ — y^, in 
other words x^ was simply itself an erratic variate determined 
by lottery drawings. 

Since the y’s in this example must be considered as ’’in- 
dependent causes”, the systematic parts x^ of the observational 
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variates must bo interpreted a«s that part of the right miomber 
in (13. 1) that consist of ^’s occurring at the same time in more 
than one x. In other words we have (13. 2), while is an 
observational variate quite extraneous to the whole system. 
Eliminating and ^2 we get the ’’true” regression system 
(!13. 3). We shall now apply the methods developed in the 
preceding Sections and see if they are capable of finding the 
’’true” regressions (13.3) and of indicating that a regression in 
the big set (1234) has no meaning. 

When the basic variates ij are determined by lottery draw- 
ings, as explained above, they will, of course, not become 
rigorously uncorrelated. This, however, only makes our example 
all the more realistic. The correlation coefficients as deter- 
mined in a series of 100 observations of the seven variates iji 
turned out to be as indicated in (23. 1). 


TABLE (23.1). CORRELATION COEFFICIENTS BETWEEN RANDOM 
VARIATES 


•^0 

./=1 2 

a 

4 

.<) 

6 

7 

i=l 

1.000000 - 0.132076 

- 0.082119 

0.()934r)6 

0.109091 

0.146380 

- 0.233315 

2 

1.000000 

0.021295 

0.019510 

0.024809 

0.055105 

0.206633 

3 


1.000000 

— 0.139912 

- 0.108802 

- 0.(!69751 

0.094886 

4 



1000000 

0.002314 

0.187958 

0,070017 

5 




1.000000 

0.041966 

- 0.268182 

6 





1.000000 

0.156706 

7 






1.000000 


We shall interpret the variates y that enter in the definiton 

(13. 1) of the X to have unit sumsquare. The moments m^j iof 
the observational variates are therefore given by (7. 5). 

The numerical computation of such bilinear forms is most 
easily carried through by first forming the matrix P,j defined by 

(23. 2) Pij = S Pik j 

k = l 

In other word is the product sum of a row in the matrix P 
and a column (or, since s is symmetric, a row) in s. The moment 
matrix is then calculated by 

M 

* = 1 


(23.3), 
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In other words, rntj is the product sum of a row in p a row 
in P. 

For checking purposes it is convenient to introduce the 
column sums 

(23.4) p„j = f p,j 

i = l 

The sums (23. 4) are simply handled as an (n + l)th row of 
the matrix p; in other words, it gives rise to the determination 
of an (w + l)th row ^oj of P by the general formula (23. 2). 
Each element in the (n + l)th row of P thus determined shall 
at the same time be the sum of the elements in the column in 
which it stands, in other words, we shall have 

(23.5) P„, = i P*, 

fc = l 

In the computation of the moment matrix the bottom rows of p 
and P are handled, just as the other rows: this gives the 
ma.gnitudes 

M 

(23.6) inoj = ^ Pok^jk 

k-l 

and 

(23.7) moo = S p^P„i 

k = l 

mf,j ought to be at the same time the sum of elements in the 
column j in the moment matrix, and the grand total of all 
elements. 

Using this technique one easily determines directly (without 
computing first the individual Xi observations) the following 
X moments: 


TABLE (23. 8) MOMENTS IN THE CONSTRUCTED EXAMPLE. 



j = l 2 

8 

4 

6 

i = l 

0.993576 — 0.121999 

0.871663 

1.135675 

- 0.223826 

2 

1.013902 

0.881726 

— 1.1 17290 

0.213635 

3 


1.772628 

0.028997 

- 0.053500 

4 



2.292407 

— 0.424277 

5 




1.000000 
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And from this we get the following correlation coefficients: 

TABLE (23. 9). CORRELATION COEFFICIENTS IN THE 
CONSTRUCTED EXAMPLE. 


hi 

j = 1 2 


4 

5 


1.000000 - 0.121551 

0.656809 

0.752502 

-0.224549 

2 

1.000000 

0.657698 

- 0.732862 

0.212166 

3 


1.000000 

0.014386 

-0.040183 

4 



1.000000 

- 0.280223 

5 




1.000000 


By the technique of Section 15 this gives the following 
tilling tables: 


TABLE 1. TILLING TABLES. TWO-SETvS: 



1 2 

hi 

1 4 

hj 

2 

6 

1 

l.(K)0000 

0.121551 

1 

1.000000 

-0.752502 

2 

1.000000 

-0.212165 

2 


1.000000 

4 


1.000000 

5 


1.000000 

A = 

0,985225 

0.985225 

A = 

0.433741 

0.433741 

A = 

0.954986 

0.954986 


1 

:i 


2 

4 


3 

5 

1 

1.000000 

- 0.656809 

2 

1.000000 

0.732862 

3 

1.000000 

0.040183 

3 


1.000000 

4 


1000000 

5 


1.000000 

A = 

0.568602 

0.568602 

A = | 

0.462913 

0.462913 

A^ 

0.998385 

0.998385 


2 

.3 

Jy. 

3 

4 

hj 

4 

5 

2 

1.000000 

- 0.657698 

3 

1.000000 

- 0.014385 

4 

1.000000 

0.280223 

3 


1.000000 

4 


1.000000 

5 


1.000000 

A = 

0.567433 

0.567433 

A = 

0.999793 

0.999793 

A = | 

0.921475 

0.921475 



■ 1 5 1 

1 

1.000000 

0.224549 

5 


1.000000 

A = | 

0.949578 

0.949578 
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TABLE 2. 

TILLING 


1 

2 

3 

1 

0.567433 

0.553533 

- 0.736753 

2 j 


0.668602 

-0.737534 

3 



0.985225 

A = | 

0.016245 

0.016245 

0.016244 


1 

2 


1 

0.462913 

- 0.429929 

- 0.663422 

2 


0.433741 

0.641395 

4 



0.985225 

A=:i 

0.015945 

0.015945 

0.015946 


1 

3 

4 

1 

0.999793 

~ 0.645984 

- 0.743054 

3 


0.433741 

0.479865 

4 



0.568602 

A = 

0.016355 

0.016356 

0.016355 


2 

3 

4 

2 

0.999793 

- 0.668240 

0.742323 

3 


0.462913 

- 0.496387 

4 



0.567433 

A = 

0.016273 

0.016272 

0.016272 

fii\ 

1 

2 

6 

1 

0.954986 

0.073910 

0.198760 

2 


0.949578 

- 0.184871 

5 



0.985225 

A=| 

0.901371 

0.901371 

o 

g 

o 


TABLES 

^ii 

THREE-SETS: 

1 3 

5 

1 

0.998385 

- 0.647786 

0.198156 

3 


0.949578 

- 0.107303 

^ 



0.668602 

A = 

0.528418 

0.528418 

0.528418 


2 

3 

5 

2 

0.998385 

- 0.666223 

- 0.238593 

3 


0.954986 

0.179723 

5 ! 



0.667433 

aH 

0.509590 

0.509591 

0.609690 



1 1 4 r> 

1 

0.921475 

- 0.689578 

0.013681 

4 


0.949578 

0.111249 

5J 



0.433741 

A = 

0.399494 

0.399495 

0.399494 



1 2 

4 

5 

2 

0.921475 

0.673408 

- 0.006800 

4 


0.954986 

0.124735 1 

5 



0.462913 1 

A = 

0.426517 

0.426517 

0.426517 


..hi 

3 

4 

* 

3 * 

0.921475 

- 0.003125 

0.036152 

4 


0.998385 

0.279645 




0.999793 

A=| 

0.919977 

0.919977 

0.919977 
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TABLE 3. TILLING TABLES. FOUR-SETS : 


^ij 

1 

2 

3 

4 

1 

0.016273 

0.001832 

— 0.011739 

— 0.010733 

2 


0.016355 

— 0.012116 

0.010782 

3 



0.015945 

— 0.000275 

4 




0.016245 

A = | 

0.000263 

0.000262 

0.000262 

0.000263 


I 1 2 3 


0.509590 

0.505360 

— 0.667867 

— 0.019629 


0.528418 

— 0.680509 

— 0.025978 



0.901371 

0.030631 




0.016245 


A = 

0.013910 

0.013910 

0.013910 

0.013910 


1 

2 

4 

f) 

1 

0.426517 

~ 0.396262 

— 0.608766 

0.009256 

2 


0.399494 

0.588487 

- 0.008831 

4 



0.901371 

— 0.008970 

5 




0.015945 

A = 

0.014507 

0.014507 

0.014507 

0.014507 


1 

3 

4 

5 

1 

0.919977 

— 0.594764 

- 0.686440 

— 0.009676 

3 


0.399494 

0.443732 

0.006843 

4 



0.528418 

0.011766 

5 




0.016355 

A = ' 

0.014956 

0.014956 

0.014957 

0.014956 



2 

3 

4 

6 

2l 

0.919977 

— 0.616012 

0.674403 

— 0.030957 

3 


0.426517 

- 0.451624 

0.021279 

4 



0.509590 

— 0.018434 

5 




0.016272 

A = 

0.014015 

0.014015 

0.014015 

0.014015 
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TABLE 4. TILLING TABLES. FIVE-SETS: 



1 

2 

3 

4 

5 

1 

0.014016 

0.001986 

— 0.010390 

- 0.009002 

- 0.000214 

2 


0.014956 

— 0.011298 

0.009483 

- 0.000524 

3 ‘ 



0.014507 

— 0.000531 

0.000498 





0.013910 

- 0.000156 






0.000263 


0.000223 

0.000223 

0.000224 

0.000222 

0.000224 


Now let US interpret the results. Consider the scatter- 
anoes first. They are contained in tlie tilling tables. We start 
by considering the two-sets. None of the two -sets shows any 
good linearity. The best are (14) with a scatterance ‘ of 0.433 
and (24) with a scatteranc^ of 0.462. If any relations worth 
while considering are to be found, it is clear that we must at 
least pass to the three-sets. Amongst the three-sets all those 
contained in (1234) stand out so distinctly that there can be no 
doubt that here is something significant. The scatterances in 
these sets are 

Scatterances 
0.0162 
0.0159 
0.0163 
0.0162 


(23. 10) 


Sets 

123 

124 
134 
234 


And the scatterances in the other three- sets range from about 
0.4 to 0.9. Since none of the two-dimensional scatterances are 
small, any of the 4 three-sets in (23. 10) may be accepted. 
This being so we ought by (V) of Section 1 to pass on to the 
four-sets. The four-set with the smallest scatterance is (1234); 
the scatterance is here 0.00026, while the other four dimensional 
scatterances range about 0.014. If we should let us be guided 
uniquely by the smallness of the scatterance, the set (1234) 
would seem excellent, particularly because there is a heavy 
drop from the subscatterancee. But by (IV) of Section 1, the 
/set (1234) must be rejected because all its subscatterances are 

* When in the text I give abbreviated figures whose correcjt value are to 
be found in another place, I do not raise the last digit even if the first digit 
dropped is 5 or more. It ie then easier to recognise the oorreot figure in the 
tables if it is wanted to look it up. 
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nearly equal, as is seen from (23. 10). The criterion indicates 
that (1234) may be a very dangerous set. 

The four set with the second smallest scatterance is (1235). 
Its subscatterances are 

ScBtterances 
0.016 
0.901 
0.528 
0.509 

There is here a definite difference between the subscatterances 
(and all of them cannot by any means be said to be small 
either), so tliat there is no danger in accepting the set (1235). 
But by compai'ing the scatterance in (1235), namely 0.0139 with 
the smallest subscatterance in (1235), namely 0.016 we find 
that there is practically nothing gained by passing from the 
three-set ,(123) to the four set (1235). A similar analysis app- 
lies to the other four-sets containing 5. 

Since the five-set (12345) contains (1234), which has already 
been recognised as dangerous, we cannot get any further. 

The conclusions here obtained by using only the scattorances 
may be resumed thus: Each of tlie 4 three-sets contained in 
(1234) is collinear, making it nonsense to speak of a regres- 
sion in (1234). The variate 5 is extraneous to the system. Of 
course, this is just the correct conclusion which we ought to 
obtain. 

In Section 33 we shall see what perfectly absurd results are 
obtained by applying to the present casie the usual regression 
technique and the significance criteria which follow from 
sampling theory. 

24. BUNCH ANALYSIS OF THE CONSTRUCTED EXAMPLE. 

Now let US apply the bunch analysis technique. This will 
lead to results which are still more definite and conclusijve. 
The complete bunch map — exhibiting graphically the numbers 
in Table 1 — 4 in Section 23 — is given in Figure 11. 

Let us first follow thb behaviour of the bunch of the in- 
tercoefficient (12) in all possible sets. In the set (12) this 
bunch is very poor, which simply means that the gross 
correlation between the variates Nos. 1 and 2 is small. But 
if we add the variate No. 3, the bunch of the intercoefficiient 


(23. 11) 


Set* 

123 

125 

135 

235 
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(12) is immediately tightened in a very conspicuous way. The 
tightness of the (12) bunch in the original (12) set and in the 
set (123) cannot be compared at all. (See Figure 11). Fur- 
thermore, the 3-beam falls inside the sector of the o-ther beams 
in the new set, and the 3-boam is longer than the other beams, 
and finally the other beams in the (12) bunch are shortened by 
the inclusion of 3. All of this points to No. 3 being essentially 
relevant. There can be no doubt that it is useful for the de- 
termination of tlie (12) intercoefficient. Adding 4 to the set 
(12) we get essentially a similar result: The (12) bunch is 
tightened, tlie 4 beam falls inside, the 4 beam is longer than 
the other beams and tlie other beams are shortened by the 
inclusion of 4. Thei’e is no doubt that 4 is useful. Thus, both 
3 and 4 are essentially useful for the determination of the (12) 
coefficient, and when either of these variates is added, the 
result is a very good fit (judging the fit by the tightness of 
the bunch). 

But the two slopes thus determined: the (12) slope in (123), 
and the (12) slope in (124) are essentially different: indeed, 
the former is negative and the latter positive. A glance at 
Figure 1, taking account of the tightness of both bunches and 
of the conspicuous difference in the slope, tells us that what is 
revealed by these two bunches is in all probability the (12) 
slope in two different equations. It does not seem possible that 
both these two bunches reveal the (12) slope in one and the 
same equation. This would indeed mean that the difference in 
slope which we have observed is only a gross-slope effect. In 
otlier words the manner in which the variates Nos. 3 and 4 
has happened to vary in the material must then have been such 
that the empirically determined (12)-cocfficient in the set (123) 
has been biassed by our not taking account of 4, and in the set 
(124) by our not taking account of 3. The tightness of the two 
observed bunches and the conspicuous difference in slopes 
makes it probable that no such bias exists, each of the two 
observed bunches representing an unbiassed coefficient, but 
in two different equations. This is already a rather definite 
indication of the multicollinearity which — from the nature of 
the constructed data — we know exists in the set (1234). We 
shall not, however, yet accept finally the conclusion of the 
multicollinearity in the set (1234) but continue the systematic 
scrutiny of the bunch map. 
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Adding 5 to the set (12), we find that this variate is quite 
iri'elevant. It does not tighten the bunch (it rather opens it), 
and the 5-beani lies outside and is much shorter than the other 
beams, and these other beams ai’e not appreciably shortened by 
the inclusion of 5, nor are their directions changed. 

Proceeding now to the four-sets, we see that in (1234) them 
is a det-inite explosion ol the (12) bunch. It is particularly 
illuminating to watch the behaviour of the (12) bunch as we 
pass either from the set (123) or (124) to the bigger set (1234). 
In the former sets the tightness is excellent, while in the latter 
set there is hardly any oi*ganisation at all left in the bunch. 
3 is decidedly detrimental for the (12) slope, when added to 
the set (124) and 4 is tlie same when added to (123). 

It is also interesting to note the degradation ejfect produced 
in the (12) coefficient in the (1234) set. The beams 1 and 2 
in this bunch are virtually the same as in the original set (12). 
Of course the absolute scale is diffemnt in the (1234) set, but 
the slopes of the beams are practically the same. The scale in 
the set (1234) is 40 times as large as in the other sets consider- 
ed so far. This is a further indication that (1234)) is a multiply 
collinear set. Thus the consideration of the bigger set which 
includes both (123) and (124) verifies the suspicion we already 
got by studying these two sets separately. 

If we add 6 either to (123) or to (124) the bunch of the (12) 
coefficient is virtually unchanged; a glance at the figure is 
sufficient to show that none of the beams are appreciably 
changed. Purthermore, in botli cases the 5-beam is very much 
smaller: it is indeed so small that, compared with' the other 
beams, it only appears as a tiny point near origin, all of which 
indicate 5 as superfluous in the determination of the (12) 
coefficient. 

A similar conclusion is reached by adding 5 to the set (1234) 
and hardly any change is produced in the (12) bunch. The 
bunch in the set (12345) is just as exploded as it was in the set 
(1234). 

Now consider the (13) coefficient. In the set (13) itself there 
is no significant tightness of the (13) bunch, but adding 2 or 4, 
we get immediately very tight bunches. In particular the (13) 
bunch in the set (134) is so close that, in the scale used in the 
figure, the three beams falls literally in one line. Again we see 
that the two (13) slopes obtained in the sets (123) and (134) are 
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quit© different, which; means that either they must belong to 
two different equations, or be very heavily biassed (i. e. not net 
determinations). A glance at the (13) bunch in the set (1234) 
verifies that we must have the former case: the bunch in the 
(1234) set is indeed distinctly exploded as compared with either 
the (123) or the (134) set. Adding 5 we find again that it is 
superfluous. 

In a similar way we can discuss each of the other inter- 
coefficients (23), (14), etc. The result of the discussion is sum- 
marised in the star map in Figure! 12. Here an asterisk, a 
circle or a blackball means a useful, superfluous and detri- 
mental variate respectively. 

On the basis of the star map and the bunch map we can now 
discuss the equations as such from the view-point of linear 
confluency, using the criteria of Section 18. A glance at the 
star map tells us immediately that all three-sets contained 
in (1234) are promising. Indeed, whatever variate we add, 
it appears as useful. And from the bunch map wo see that each 
of these variates is not only useful but does what is necessary 
to produce a good fit. Indeed, all the bunches in the set (123) 
are good, and so are all the bunches in the set (124) etc. They 
are so good that it seems difficult to escape the conclusion that 
any of these sets is a closed, admissible set. The final test on 
this is given by the horizontal section (1234) in the star map 
(consisting of the four lines 1, 2, 3, 4). Each and all of the 
signs in this section is a blackball, indicating that no matter 
what variate we add in order to get the set (1234), it will 
appear as detrimental. And this applies no matter which one of 
the intercoefficients we consider. And taking a glance at the 
bunch map we see that this detrimental effects is violent: all 
the bunches in the set (1234) are definitely exploded. 

It is also illuminating to study — by means of the star map 
— the four sets obtained by adding 5 to any of the three-sets 
contained in (1234). Take, for instance, the set (1235). All the 
variates 1, 2, 3 in this set are indicated exclusively by asterisks, 
while 5 is indicated exclusively by circles. Similarly in the 
sets (1245), (1345) and (2345). The superfluity of 5 is also 
checked by the zero slope criterion. Indeed, fro-m the bunch 
map we see that the intercoefficient between 5 and any one of 
the variates 1, 2, 3, 4 is zero in all the four-sets containing 5. 

The conclusion is thus definitive. Each of the three-sets 
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contained in (1234) is a closed set, and hence (1234) iteelf 
multiply collineai', while 5 is a variate entirely extraneous to 
the system. This is just the conclusion we ought to find. 

The set (12345) is primarily interesting because of the 
degradation effects and the persistency effects that manifest 
themselves here. The leading beams in the (12) coefficient in 
this all-inclusive set are just the same as they were in 
the original two-set (12), and the same applies to the inter- 
coefficients (13), (23), (14), (24), (34). This is the degrada- 
tion effect. None of these bunches are tight; they ai'e all ex- 
ploded. 

But the bunches (15), (25), (35), (45) are tight in the set 
(12345) even though this set is multiply collinear. More precis- 
ely these bunches show a zero-slope. The explanation is that, 
oven in the all-inclusive set, these intercoefficients will have a 
meaning. No matter how we derive a five- variate equation 
from any of th,o closed admissible equations (123), (124) . . . 
(1235) . . . etc. the new equation will have the same coefficient 
of the variate No. 5, namely zero. This is the persistency effect 
discussed in vSection 11. rncidentally, since the persistency 
effect here gives zero slope (instead of some other well defined 
slope), we can take it as an additional criterion of the superflu- 
ity of variate No. 5. 

25. COMPATIBILITY SMOOTHING OF THE REGRESSION COEFFI- 
CIENTS IN THE CONSTRUCTED EXAMPLE. 

Since all the throe- sets contained in (1234) are admissible, 
we are confix)nted with a problem of compatibility smoothing 
of the regression coefficients in these 4 sets. For instance, if 
the coefficients in (123) are determined empirically as the 
diagonal regression coefficients in this set, and similarly in 
(124), we can from these results derivi the coefficients that 
ought to exist in the set (134); and these may not coincide 
exactly with the diagonal regression coefficients determined 
directly in the set (134). 

We shall first see how the compromise can be made by the 
method of diagonal zeros explained in Section 21. The matrix 
Agi in (21. 1) is in the present case determined by forming the 
JV" 4 diagonal regression equations in the 4 three-sets contain- 
ed in (1234). The material for this — when the variates are 
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takjen in the normalised farm — is of course contained in the 
tilling' tables. Performing the rootsquaring we get 


TABLE (25. 1) REGRESSION GOEFFIGIENTS IN THE CONSTRUGTED 
EXAMPLE. 


^Ki 


i = 1 

2 

8 

4 

Row sums 
(Upper number is 
the natural row 
sum, lower the 
absolute row sum) 

(234) K = 

1 

.00(X)00 

.999896 

— .680377 

.753281 

1 .072800 

2 .433554 

(134) 

2 

.999896 

.000000 

— .658590 

— .754057 

— .412751 

2 .412543 

(124) 

3 

.680377 

— .658590 

.000000 

— .992585 

— .970798 

2 .331552 

(123) 

4 

.753281 

.754057 

— .992585 

.000000 

.514763 

2 .499923 


Reducing by the absolute row sums we get 


TAlil.K (25. 2) REGRESSION COEFFICIENTS HEDUCED TO ABSOLUTE 
ROW SUM UNITY. 


^A'/ 

— 

t = i 

2 

3 

4 

Row sums 
(natural and 
absolute) 

K=1 

.000000 

.410879 

- .279582 

.309539 

.440836 

1.000000 

2 

.414457 

.000000 

— .272986 

— .312557 

— .171086 
1.000000 

3 

.291813 

— .282469 

.000000 

— .425719 

— .416375 
1.000001 

4 

.301322 

.301632 

— .397046 

.000000 

• 

.205908 

1.000000 


The compatibility table defined by (21. 7) and computed on 
the basis of the figures in (25. 2) turns out to be 
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TABLE (25. 3) COMPATIBILITY TABLE. 



i = l 

2 

3 

4 

Absolute 
row sum 


23 

.000000 

.415195 

— .282520 

.302285 

1.000000 

j 5 r=i 

24 

•000000 

.414644 

— .272980 

.312377 

1.000001 


34 

.000000 

.414915 

— .277665 

.307420 

1.000000 


13 

.418708 

.000000 

- .275785 

— .305507 

1.000000 

2 

14 

.418294 

.000000 

— .266257 

— .315449 

1.000000 


|34 

.418504 

.000000 

— .271099 

— .310396 

.999999 


1 

12 

.289743 

— .280463 

.000000 

— .429794 

1.000000 

3 

14 

.294609 

— .275595 

.000000 

— .429796 

1.000000 


24 

.285035 

— .285171 

.000000 

— .429794 

1.000000 


12 

.299326 

.299634 

— .401039 

.000000 

.999999 

4 

13 

.304304 

.294723 

— .400973 

.000000 

1.000000 


23 

.294177 

.304718 

— .401105 

.000000 

1.000000 


In the actual work it will be found convenient to compute 
also an intermediate table giving the results of (21. 6) (possibly 
with a sign factor). In this intermediate table both the natural 
and the absolute row sums are carried, while in (25. 3) only the 
absolute row sum is carried. 

A horizontal Section of (25. 3) corresponds to a row in (25. 2), 
and the coefficients in these two tables are directly comparable. 
For instance for 1, Ar=4, the figures 0.299, 0.304, 0.294 are 
to be compared with 0.301. If the four regression equations 
had been exactly compatible, all these numbers would have been 
equal. The amount of spread which is actually present in these 
figures, indicate the degree of non- compatibility which exists 
between the equations used. 

According to (21.8) we next form the simple arithmetic 
average of the three figures in each (Ki) cell of (25.3). Then 
according to (21. 9) we take the average between the above 
average and the original figure in (25. 2). This gives the figures 
on the rows v=l in the various cells of (25.4) 
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TABLE (25.4) SUCCESSIVE SMOOTHING. 



a 

i = \ 

2 

3 

4 

Row sums 
(natural and 
absolute) 



0.000000 

0.410879 

- 0.279582 

0.309539 

.440836 

1.000000 

K=:l 

1 

.000000 

.412898 

- .278652 

.308450 

.442696 

1.000000 


2 

.000000 

.412897 

~ .278652 

.308451 

.442696 

1.000000 



.414457 

.000000 

- .272986 

- .312557 

- .171086 
1.000000 

2 

1 I 

.410480 

.000000 

- .272017 

- .311504 

- .167041 
1.000001 


2 

. 41 G 478 

.000000 

- .272017 

-.311505 

- .167044 
1.000000 


y =:0 

.201813 

- .282409 

.000000 

- .425719 

- .416375 
1.000001 

3 

1 

.200804 

- .281440 

.000000 

- .427757 

- .418393 
1.000001 


2 

.200805 

- .281440 

.000000 

- .427755 

- .418390 
1.000000 


*^ = 0 

.301322 

.301632 

- .397046 

.000000 

.205908 

1.000000 

4 

1 

.300205 

.300662 

— .399043 

.000000 

.201914 

1.000000 


2 

. 30029 G 

.300662 

— .399042 

.000000 

.201916 

1 . 000.000 


These figures we now take as a new starting point. Groing 
through exactly the same process once more we find as a 
second smoothing the figures on the row r = 2 in the various 
cells of (25. 4). To give an idea of the extreme rapidity with 
which the process converges towards a situation where all the 
4 equations are exactly compatible I give the compatibility 
tables for AT— 4 computed with 10 decimal places, and the 





139 


similar tables for tlie first and second smoothing b In these 
tables are added for comparison also the coefficients that served 
as the starting point for each smoothing. These are the same 
as those given in (25. 4) 

TABLE (25.5) COMPATIBILITY TABLE FOR THE UNSMOOTHED 
COEFFICIENTS. 



/=! 234 

Absolute 
row sum 

(i'=0in (25.4)) 

.3013216807 .3016320903 -.3970462290 .0000000000 

1.0000000000 

12 

.2993261512 .299634.5051 -.4010393437 .0000000000 

1.0000000000 

13 

.30430.31139 .2947220753 -.4009748108 .0000000000 

1.0000000000 

23 

.2941775623 .3047163358 -.4011061020 .(KIOOOOOOOO 

1.0000000001 


TABLE (25.6) COMPATIBILITY TABLE FOR COEFFICIENTS 
SMOOTHED ONCE. 




i=l 2 3 4 

Absolute 
row sum 


i 

(»^ = lin(25.4)) 

.3002953116 .3006615312 -.3990431573 .0000000000 

1.0000000001 


12 

.3002965468 ..3006626180 -.3990408351 .0000000000 

.9999999999 

II 

13 

.3002923084 .3006667436 -.3990409480 .0000000000 

1.0000000000 


23 

.3003009316 ..3006583.501 -.3990407183 .0000000000] 

1.0000000000 


TABLE (25. 7) COMPATIBILITY TABLE FOR COEFFICIENTS 
SMOOTHED TWICE. 



t=l 234 

Absolute 
row sum 

(v=:2 in (25.4)) 

.3002959.536 .3006620509 -.3990419956 .0000000000 

1.0000000001 

12 

.3002959536 .3006620509 —.3990419955 .0000000000, 

1.0000000000 

13 

.3002959537 ,3006620509 -.3990419955 .0000000000 1 

1.0000000001 

23 

.3002959535 .3006620509 -..3990419956 .0000000000* 

l.OOOOOOOOOO 


For i = l, for instance, we see that the difference 

between the highest and lowest of the unsmoothed coefficients 
which ought to be compatible is about 0.01. Already by the 
first smoothing this difference reduces to about 0.000009 and in 
the second smoothing it virtually does not show up in the ten 
decimal places carried. 

^ The original reg^ression coefficients were computed with 6 decimal 
placee; for the purpose of the caloulation here considered four aeroe were 
added throughout. For the experimental verification of the rapidity of con- 
vergency this is just as good as if we had added any other figures. 
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As an illustration we shall also show ho-w the compatibility 
smoothing by the general method of Section 22 is carried out. 

The matrix (25. 1) reduced to row sum square unity ^ is given 
in (25. 8). 


TABLE (25.8) REGRESSION COEFFICIENTS REDUCED TO ROW 
SUM SQUARE UNITY. 


^Ki 

jr-i 

2 

3 

4 

Row sums 
(natural and 
square row 
sums) 


.000000 

.701766 

- .477515 

• 528682 

0,752933 

1.000001 

2 

.706657 

.000000 

— .465446 

- .532915 

- 0.291704 






1.000002 

.8 

.495968 

— .480086 

.000000 

- .723556 

-0.707674 






1.000000 

4 

.517202 

.517735 

- .681508 

.000000 

0.353429 

1.000001 

Natural 
Column sums 

1.719827 

.739415 

- 1.624469 

- .727789 

0.106984 


The row moments of (25. 8) are 


TABLE (25. 9) ROW MOMENTS OF (25. 8). 


^^KII 

//=l 2 3 4 

Natural 
row sums 

ff=l 

1.000000 - 0.059485 - 0.719439 0.688759 

0.909835 

2 

1.000000 0.736072 0.682689 

2.359276 

3 

1.000000 0.007958 

1.024591 

4 

1.000000 

2.379406 


In the present case the number of independent regressions 
is X = 2. We consequently need to till the matrix (25. 9) up to 
and including the three rowed sets. The result is given in 
(25. 10) and (25. 11) 

* The symbol in (25. 8) is not exactly the same as in (25. 2) because 
the row reductions are now by square sums instead of absolute sums. 






TABLE (25.10) TWO-ROWED TILLING TABLES. 


1 1 2 


2 

3 


2 

4 

1 

1.000000 

.059485 

2 

1.000000 

- .736072 

2 

1.000000 

- .682689 

2 


1.000000 

3 


1.000000 

4 


1.000000 

A== 

.996462 

.996462 

A = 

.458198 

.458198 

A = 

.533936 

.533936 


1 

3 


1 1 



3 

4 

1 

1.000000 

.719439 

1 

1.000000 

- .688759 

3 

1.000000 

- .007958 

3 


1.000000 

4 


1.000000 

4 


1.000000 

A = 

.482408 

.482408 


.525611 

.525611 

A = 

.999937 

.999937 


TABLE (25. 11) THREE-ROWED TILLING TABLES. 


1 1 2 3 

1 1 3 4 I 

1 

.458198 - .470074 .675654 

1 

.999937 . .724920 -.694484 

2 

.482408 - .693276 

3 

.525611 - .503478 

3 

.996462 

4 

.482408 

A = 

.000069 .000069 .000069 

A = 

.000069 .000069 .000069 


1 2 4 


2 3 4 

1 

.533936 .529693 - .729369 

2 

.999937 -.730639 -.676831 

2 

.525611 -.723659 

3 

.533936 .494550 

4 

.996462 

• 4 

.458198 

A = 

.000068 .000069 .000069 

A = 

.000069 .000069 .000069 


The cumulated matrix defined by (22. 9) is ^iven in (25. 12) 


TABLE (25. 12) CUMULATED MATRIX. 


^(A’ U) 

/r=i 

2 

3 

4 

jr=i 

2 

3 

4 

1.992071 

.059619 

2.007956 

1.400574 
- 1.423915 
2.056009 

- 1.423853 
- 1.400491 

- .008928 
1.937068 


Sura of .diagonal elements = 7.993104 I , , , , , 

Sum of elements in north east triangel = — 2.796994 J ® ^ • 


On the basis of the figures in (25. 8) and (25. 9) the averages 
liKi defined by (22.7) becomes as in (25.13). This table is 
checked by row sums and column sums. (The column and row 
sums are also needed for subsequent checks). 
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TABLE (25. 13). 



i = l 

2 

3 


Natural 
row sum 

K= 1 

— .000176 

.707593 

— .473185 

.524663 

.758895 

2 

.712442 

— .000178 

— .461154 

— .528797 

— .277687 

3 

.491650 

— .475802 

— .000022 

— .729221 

— .713395 

4 

.513196 

.513624 

— .687515 

— .000019 

.339286 

Natural 






Column sum 

1.717112 

.745237 

— 1.621876 

— .733374 

.107099 


The matrix defined by (22. 13) is 


TABLE (25. 14), 



i = l 

- 

3 

4 

Natural 
row sum 

K- 1 

.000176 

— ,005827 

— .004330 

.004019 

— .005962 

2 

— .005785 

.(K)0178 

— .004292 

— .004118 

— .014017 

3 

.001318 

— .004284 

.000022 

.005665 

.005721 

4 

.004(K)6 

,004111 

.006007 

.(KXK.)19 

.014143 

Natural 
Column sum 

.002715 

— .(K)5822 

— .002593 

.005585 

— .000115 


And the nuitriecs Vkii defined by (22. 15) and (22.10) 

are 

TABLE (25. 15). 



]f=\ 2 3 4 

Natural 
row sum 

Zi = 1 

.(M)0206 — .000005 — .00(X)45 .000052 

.000208 

2 

.000208 .000047 .000050 

.000300 

3 

.000198 —.000001 

.000199 

4 

.000214 

1 .000315 


Grand total .001022 


TABLE (25. 1C). 


7/=l 


3 

4 

Natural 
row sum 

.000069 

.000000 

.000048 

— .000049 

.000068 


.000069 

— .000049 

— .000048 

— .000028 



.000069 

.000000 

.000068 




.000069 

— .000028 


Grand total .000080 
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The grand totals of (25. 15) and (25. 16) are checked by (22. 17) 
and (22. 18) respectively. 

The coefficients of the subcharacteristic polynomials Fj,-{X) 
(K =z 1, 2, 3, 4), (that is the first principal minors in (22. 19)), 
can in this case be determined directly by (22. 22), each deter- 
minant being computed directly by Sarrus rule. All the work 
can be done in one stroke if one has a multiplication machine 
with an extra arrangement for grand total (besides sub-total), 
cumulation. The same can also be done — although not quite 
as easily — if the operator usas at the same time an ordhwy 
multiplication machine and a listing adding machine. It is 
convenient to writ© (or better typewrite) the columns of the 
matrices //, v and on loose strips and permute them so as to 
obtain the various terms in (22. 22). It will be found that all 
the terms vanish (at least in the first 6 decimal places) except 
those that contain two affixes //. These terms are given in 
(25. 17). 


TABLE (25.17) COEFFICIENTS OF SUBCHARACTERISTIC 
POLYNOMIALS 


1 


1 

1 


^ !*■ 

.000000 

.000009 

.000008 

.000009 

Sflfl y 

.000275 

.000277 

.000278 

.000271 

y 

.000274 

.000270 

.000283 

.000208 


In view of the fact that the other terms S vanish, each 
column in (25. 17) gives directly the coefficints of the polynom- 
ial in question (with the sign of the middle term changed, 
see (22. 21)). 

The computation of these coefficients is much simpler than 
may appear from the description. As a matter of fact it is so 
simple that there is no use applying any check during the 
work. But the final result must of course be checked and this 
is done most conveniently by computing the values of the four 
polynomials Fj^ for some value of A, say ^ = 0.1, according to 
the formulae obtained, and verifying that these values are equal 
to those obtained by evaluating the corresponding three-rowed 
determinants directly after having inserted this value of 1. In 
the present case this actually checked immediately for all the 
four polynomials. The following values were found: 
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TABLE (25. 18) VALUES OF THE SUBCHARAGTERISTIG POLYNOMIALS 


for I = 0.1. 



Computed by 
(25. 17) 

Computed by 
evaluation of the 
determinant 


0.000044 

0.000043 

F, 

0.000044 

0,000045 

Fs 

0.000043 

0.000042 

L 

1 0.000045 

0.000044 


Since by the argument in connection with Figure 10 of Section 
22 all the subcharacteristic polynomials must touch zero, (not 
pass through zero), the two roots of any of the second order 
polynomials obtained ought to coincide. This is actually veri- 
fied, the zeros being as indicated in (25. 19). 

TABLE (25. 19) ZEROS OF SUBCHARAGTERISTIG POLYNOMIALS; 

(EACH ZERO IS DOUBLE). 


For 

% 

K-\ 

0.498 

2 

0.502 

3 

0.491 

4 

0.506 


The fact that all the zeros given in tliie second column of 
(25. 19) iieaidy coincide shows that it is possible to select I in 
(22. 12) in such a way that the smoothed matrix has a row 
moment matrix (22. 14) which comes very near to having simul- 
taneously all its principal minors of order x + 1 = 3, equal to 
zero. This smoothed a\'i will then be very near to fulfilling the 
compatibility condition we want to satisfy. 

The average value of the four zeros in (25. 19) is for 
practical purposes exactly 0.5. In other words the present more 
elaborate analysis leads to adopting just the simple hind of aver- 
age which teas suggested heuristically in (21. 9). Adopting this 
average, we get the following smoothed values. 


TABLE (25. 20). 



i = 1 

2 

3 

4 

Natural 
row sum 

K=1 

— .000088 

.704679 

— .475.350 

.526672 

.755913 

2 

.709549 

— .000089 

- .463300 

- .530856 

— .284696 

3 

.493809 

— .477944 

— .000011 

- .726388 

- .710534 

4 

.515199 

.515679 

— .684511 

—.000009 

.346358 
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If wanted, the smoothing can be repeated, but in the present 
case this will hardly be worth while. 

In order to get equations that can be directly compared with 
those found previously we ought to combine the coefficients 
(25. 20) in such a way as to reduce the diagonal elements to 
zero. Since the coefficients of (25. 20), are not yet exactly 
compatible, it may have a small effect how the elimination is 
performed. Doing it alternately by the various equations and 
taking the average we get 


TABLE (25.21). 



i= 1 

2 

3 

4 

K=1 

.000000 

.704680 

- .475410 

.526642 

2 J 

.709538 

.000000 

— .463359 

— .530790 

3 

.493800 

— .477952 

.000000 

- .726388 

4 

.515194 

515685 

— .684510 

.000000 


The results thus obtained we may now compare with the 
coefficients obtained by using the ’’true” regressions (13. 3) by 
which the data were constructed. Since the empirical regress- 
ion coefficients determined by the preceding methods are 
worked out in normalised coordinates, we must either transform 
(13. 3) to normalised coordinates, or transform the final values 
in (25. 4) (for v — 2) and (25. 21) to non-normalised coordinates. 
We prefer the former. This means that the coefficients of 
(13. 3) must be reduced by the square roots of the diagonal 
elements in (13.8). Doing this we find that the four ’’true” 
regressions in normalised coordinates are 


TABLE (25.22). 



i = l 

2 

3 

4 

Natural 
row sum 

K= 1 

.000000 

2.013854 

~ 1.331401 

1.514070 

2.196523 

2 

1.993566 

.000000 

- 1.331401 

- 1.514070 

- .851905 

3 

.996783 

— 1.006927 

.000000 

- 1.514070 

— 1.524214 

4 

.996783 

1.006927 

— 1.331401 

.000000 

.672309 


Reducing this to absolute row sum and row sum square equal to 
unity respectively, we obtain: 
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TABLE (25. 23.) »TRUE» REGRESSIONS REDUCED TO ABSOLUTE 
ROW SUM 1. 



1 

2 

3 

4 

Row sums 
(natural and 
absolute) 

jr=i 

.000000 

.414431 

— .273989 

.311580 

.452022 

1.000000 

2 

.411976 

.000000 

— .275138 

— .312887 

— .176049 
1.000001 

3 

.283356 

- .286239 

.000000 

- .430405 

' — .433288 
1.000000 

4 

.298876 

.301917 

— .399207 

.000000 

.201586 

1.000000 


TABLE (25.24) »TRUE» REGRESSIONS REDUCED TO 
ROW SUM-SQUARE 1. 



1 

2 

8 

4 

1 

Row sums 
(natural and 
square) 

K=1 

.000000 

.706696 

— .467211 

.531313 

.770798 

1.000000 

2 

.703107 

.000000 

— .469569 — 

.533994 

- .300456 
1.000000 

3 

.480698 

- .485590 

.000000 - 

.730159 

- .735051 
1.000000 

4 

.512683 

.517900 

— .684789 

.000000 

.345794 

1.000000 


Comparing finally tho last values (i. e. for v = 2) in (25. 4) 
with (25. 23) and (25. 21) with (25. 24), wo obtain the following 
percentage errors: 


TABLE (25.25) PERCENTAGE DEVIATION OF EMPIRICAL REGRES- 
SION COEFFICIENTS FROM »TRUE» VALUES. 
(Compatibility smoothing by the method of Section 21). 



1 ' = 1 

2 

8 

4 


Pet. 

Pet. 

Pet. 

Pot. 

2 r=i 

.00 

— .37 

1.70 

1.00 

2 

1.09 

.00 

-1.13 

.44 

3 

2.63 

— 1.68 

.00 

— .62 

4 

.48 

— .42 

- .04 

.00 
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TABLE (25. 26) PERCENTAGE DEVIATIONS OF EMPIRICAL REGRES- 
SION COEFFICIENTS FROM »TRUE» VALUES. 
(GompatiLility smaothiflig by the method ef Section 22). 



< = i 

2 

8 

4 


Pet. 

Pet. 

Pet. 

Pet. 

K=1 

.00 

— .29 

1.75 

.89 

2 

.91 

.00 

-1.34 

.60 

3 

2.73 

— 1.60 

.00 

— .52 

4 

.49 

— .43 

- .04 

.00 


It is seen that the empirically determined coefficients come 
very close to the ’’true” values. On the average the error only 
amounts to a fraction of one per cent. Only in a few cases 
does the error run as high as one or two per cent. 

It is further seen that the two methods lead essentially to the 
same result. In practice it therefore seems advisable to use 
whenever possible the simple method of 21. In most cases one 
single smootliing will probably be sufficient. Furthermore in 
practice one will as a rule not need to carry as many decimal 
places as we have used — for illustration purposes — in this 
and the preceding Section. The actual work will therefore be 
comparatively easy. 

26, AN EXAMPLE IN 6 VARIATES FROM AMERICAN CONSUMPTION 

STATISTICS. MEASUREMENTS OF THE MONEY FLEXIBILITY. 

In static economic theory one studies how the individual (the 
family) distributes its resources under a given system of prices 
and a given income. By making certain assumptions about the 
manner in which the cost of living enters into this mechanism 
one concludes that for a given commodity which is not in 
substitution connection with other commodities, there ought to 
exist the following equation * : 

(26. 1), w(r) = a • u{x) 

where r = real (deflated) income. 

X = quantity consumed per unit of time of the reference 
commodity. 

^ See lor imstanco the present author’s book »New Methods of Measuring 
Marginal Utility*. Tiihingen 1931. 
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^ ‘ ' a — ~ — inverted relative price of the reference com 

^ modify, p being its absolute price and P the 
cost of living. 

The function iv[r) in (26. 1) is the function that exhibits how 
w, the marginal utility of the money unit, varies as a function 
of the income, and the function u(x) represents how w, the 
marginal utility of the commodity unit, varies as a function of 
the quantity consumed. The equation (26. 1) defines a surface 
in( ce, r, x) coordinates, the so called ’’surface of consumption”. 
If a set values of (a, r, x) are observed they ought to lie on 
this surface. A statistical (a, r, x) scatter should therefore give 
us some information of the actual shape of the surface. 

The money flexibility, namely 


(26. 3) 


d log w(r) dw(r) r 

d log r dr w[r) 


can be determined if the shape of the consumption surface is 
known. It can even be determined if only a section of the 
surface is known, namely one for which x — constant. Such an 
(a, r) curve is called an isoquant. From (26. 1) we see that 
the logarithmic derivative of r with respect to a along any 
isoquant gives the flexibility. This forms the basis of the 
flexibility measurements I undertook in 1923 and 1930, and it 
also served as the basis for further work done by Frederick V. 
Waugh, Maurice H, Belz and myself during these gentlemen’s 
stay in Oslo. 

This further work was done on American, Swedish and 
'Norwegian data of different kinds, household budgets as well 
as time series of national income and consumption. A complete 
account of the results obtained by this comparative analysis 
will be published separately as another of the Oslo Institute’s 
publications. In the present connection I only select two sets 
of data that may examplify the use of the confluence technique 
discussed in the present paper. 

The data in question were collected and prepared for this 
analysis by Dr. Waugh. It contains information which permits 
to compute the three variates (26. 2) for each year for the 
United States as a whole. Dr. Waugh wanted to apply the 
method to data relating to a whole country. In his mind the 
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flexibility measurement thus obtained would be more interest- 
ing than flexibility measurements referring to particular groups 
(customers in cooperative chain stores as in my study of 1923, 
or working men’s families as in my 1930 study). Dr. Waugh 
also suggested that the rate of change with respect to time of 
a and r may exert some influence on the flexibility, so that the 
year to year changes of these two variates ought to be included 
in the analysis as new variates. This gave the analysis a more 
dynamic character. Finally, time itself was taken in as a catch 
all for the trend factors. The list of variates being thus enlarged 
some simplifications had to be made in the theoretical scheme. 
In my previous studies I had used methods that did not assume 
any particular form of the functions w(r) and u{x). We now 
decided tentatively to work with functions that were linear in 
the logarithms of the variates. This led to considering a linear 
regression between some or all of the following six variates. 


a?! = log a 

^2 = log ^ 

x^ = x^ = yeiiT to year change of 
(26.4) = a ;2 = year to year change of x^ 

x^ = i\me (linear trend factor). 

iTg^log X (x being quantity consumed per year. Pro- 
visoric means were used for the nume- 
rical work). 

The intercoefficiont of the variate No. 1 and the variate No. 2 
in the regression equation connecting these variates will give 
the money flexibility (when the variates are taken in iion- 
normalised coordinates). 

The period which will be treated here is the post war period 
1919—31, and the data will be used for the two reference com- 
modities meat and butter for the United States. (In the 
complete work also other periods and commodities were con- 
sidered). The correlation matrices were 
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TABLE (26. 5). CORRELATION COEFFICIENTS IN MEAT 
(1919—31. tJ. S.). 




2 

3 

4 

6 

6 

i = l 

1.000000 

- 0.549342 

0.212052 

0.421796 

— 0.664092 

0.511130 

2 


1.000000 

— 0.763343 

0.327788 

0.680848 

0.402229 

3 



1.000000 

- 0.534176 

- 0.378352 

— 0.497311 

4 




1.000000 

- 0.203932 

0.785384 

5 





1.000000 

— 0.134523 

6 






1.000000 


TABLE (26. 6). CORRELATION COEFFICIENTS IN BUITER 
(1919—31. U. S.). 


j = l 2 8 4 6 6 


i 


1 

2 

3 

4 

5 

6 


l.(X)0(X)0 — 0.070778 0.558808 - 0.217851 0.548585 0.513617 

1.000000 —0.460509 0.327788 0.680848 0.775721 

1.000000 —0.701721 0.122482 —0.130910 

1.000000 — 0.203920 0.265165 

1.000000 0.788472 

1.000000 


The sumsquaree of the variates 1 and 2 were 

Meat Butter 

(26 7) ^11 = 0.016865 0.025360 

m^, = 0.020824 m,,2 = 0.020824 

These values are needed in order to get back from the nor- 
malised to the non-normalised variates. 

From the data in (26. 5) and (26. 6) a complete tilling' was 
done, and the bunch map for the intercoefficient (12) was 
drawn. Only the bunch map for tliis intercoefficient was con- 
sidered, since the main object of the analysis was to investigate 
the money flexibility. The (12) bunch maps for meat and butter 
are given in Figures 13 and 14. It will be noted that these maps 
also contain a variate No. 7. This variate will be discussed 
later. For the moment we shall consider only that part of the 
maps that refer to the variates Nos. 1 — 6. 

The bunches in Fig. 13 and 14 are drawn on scales as indigat- 
ed in the following tables: 
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SCALES USED IN FIG. 13, MEAT. 


Set 

Scale of 
enlargement 

Set 

Scale of 
enlargement 

Set 

Scale of 
enlargement 

12 

1 

1246 

2 

123467 

10 

123 

» 

12347 

» 

123456 

20 

124 

> 



123467 

» 

125 


12345 

5 

123567 

> 

126 


12346 


124567 

> 

127 

• 

12356 

12456 

» 

1234567 

100 

1234 


12357 

> 



1235 


12457 

» 



1245 


12367 

» 



1236 


12467 




1256 

» 

12567 




1237 






1247 

> 





1257 






1267 

. ! 






SCALES USED IN FIG. 14, BUTTER. 


Set 

Scale of 
enlargement 

Set 

Scale of 
enlargement 

Set 

Scale of 
enlargement 

12 

1 

1256 

2 

12567 

20 

123 


12347 

» 

123456 

> 

124 

» 





125 

> 

1257 

5 

123457 

50 

126 ! 

. 

1267 

> 

123467 

> 

127 


12345 

> 

123567 

100 

1234 

> 

12346 

> 

124567 

t 

1235 


12356 

> 



1245 

» 



1234567 

500 

1236 


12456 

10 



1246 


12357 

> 



1237 

, 

12457 

> 



1247 


12367 

> 





12467 

> 




We shall discuss the meat map first. Looking at the three- 
sets we are immediately struck by the fact that it is the set 
(126) that has the best tightness. This is already a first in- 
dication of the plausibility of the theoretical basis which we 
started from, namely that there exists a structural relation 
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between the variates (126). Comparing the bunch in (126) with 
that in (12), we see in particular that no statistically significant 
information about the intercoefficient in question is obtained 
before we add 6. This is a general feature which will be noted 
also in all the other sets. Notice for instance how all the bun- 
ches (12), (123), (124), (125), (1234), (1235), (1245), (12345) are 
tightened when 6 is added. This gives an interesting empirical 
corroboration of the ’’isoquant” idea (it will be remembered that 
6 denotes the quantity consumed). The money flexibility as 
computed by the diagonal regression in the set (126) is ^ 0.84. 
This flexibility is found simply by de-normalising the diagonal 
regression coefficients by means of (26. 7). The I'esulting 
flexibilities are indicated in the map. 

Now let us proceed to the four-rowed sets to test Dr. Waugh’s 
suggestion that some of the other variates may exert an in- 
fluence. A glance at the map is sufficient to throw out all the 
four-sets that do not contain 6. This leaves us with the 3 four- 
sets (1236), (1246), (1256), in other words, the 3 sets obtained 
by adding to (126), 3, 4 and 5 respectively. In all these sets 
there is a relatively good tightness. The fit is perhaps slightly 
inferior to that in (126), but there is no ’’explosion”. 

It thus seems to be allowed to add any of the variates 3, 4 
or 5 to (126). The effect on the general slopes of the bunches 
produced by the inclusion of these variates is as follows: 
3 makes the slope a little steeper (flexibility 0.92 as against 
0.84). 4 leaves it unchanged and 5 makes it a good bit steeper 
(flexibility 1.03 as against 0.84). The general slopes of the 
bunches, as well as their tightness, are seen most easily from 
the small vertical bars that indicate the distance between the 
prolongation of the leading benms 2 . The steepening effect of 0 
is quite significant since the (1256) sector lies entirely outside 
(on the lower side) of the (126)-sector. This is interesting: it 
means that the trend connection between the variates has — 
by being left in the material — biassed the result observed in 
the set (126). 

It seems probable that the difference in slope, say in the sets 
(1246) and (1256) is not due to tliese being slopes in two 

1 In the following I say for brevity that the flexibility is 0.84, 1.03, etc. 
meaning by this that the flexibility is — 0.84, — 1.03, etc. All the flexibilities 
considered are negative; indeed, all the bunches considered are sloping down. 

2 The leading beams are marked with small circles. 
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independent equations, but due only to the bias produced by 
neglecting to eliminate simultaneously the effects of the various 
variates. In other words it appear to be a gross slope effect not 
a multilinear effect. In order to test this we want to see if 
there is produced an explosion when we go to the higher sets. 

Looking at the five-sets we see again that there can be 
no question of accepting a set where 6 is not included. This 
leaves us with the 3 sets (12346), (12356) and (12456), that is 
with the sets obtained by adding (34), (35) and (45) respectively 
to (126). All these sets are still fairly good; (12356) gives the 
steepest slope (flexibility 1.11) which was to be expected since 
in the four-sets both 3 and 5 had a tendency to steepen the 
slope. In (12356) these steepening effects will be accumulated. 
Again we find that 4 virtually does not change the slope. The 
flexibility is 0.91 m (12346) as against 0.92 in (1236) and 1.06 
in (12456) as against 1.03 in (1256). 

The flexibility in the 3 five-sets considered are somewhat 
different, namely 0.91, 1.11 and 1.06 respectively, and the 
tightness of the slopes are such that these differences appear 
significant. We are therefore again in the situation that it is 
desirable to test the higher sets. Only one such set exists, 
namely the one including all the six variates considered. Even 
in this set the tightness is fairly good, indicating that there is 
no danger in considering this set, the tightness is even better 
than in most of the subsets. In this big set the slope is the 
same as in the five-set that included both the steepening fac- 
tors 3 and 5. In both cases the diagonal flexibility is 1.11. This 
is another verification of the fact that 4 does not influence the 
result. 

It is very instructive to look at the appearance of the whole 
bunch in (123456). The various beams in this total bunch give 
an excellent expression for the importance of the various 
variates and for the sense in which they will bias the (12) slope 
if their influence is not taken account of by including them in 
the regression equation. In the first place we notice that (126) 
are the three important variates. Their beams are much longer 
than the other beams in the big set (123456). And the relative 
disposition of the three important beams remains practically 
the same in all lower bunches down to the set (126). The only 
effect of the other variates is that the sub-bunch of (126) is 
swayed a little up or down according to which supplementary 
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variate (-s) we include. We have seen that 4 had practically 
no effect all the way through. This is summarised by the posi- 
tion of the 4-beam in the set (123456). Indeed, 4 is a very 
short beam: it has a small power to ’’enforce its will”. And 
furthermore its direction coincides with the average slope of 
the leading beams: it ’’wants to go the same way” as the beams 
of the two variates whose intercotefficient we are discussing. 
Both these facts make it plausible that 4 does not influence the 
result observed. On the other hand the variates 3 and 5 were 
observed to exert a steepening influence all the way through, 
the effect from 5 being the strongest. This is also summarised 
in the bunch of the big set (123456) by the position of the 
beams 3 and 5. Indeed, both are steeper than the leading slope 
(they try to ’’pull down”), and 5 is the longer of the two. 

When all these facts are taken into consideration, we may 
formulate the following conclusion. If no disturbing factor 
outside the set (123456) is taken into account, the money 
flexibility computed by using meat as a reference commodity is 
about 1.1, the first decimal place after the comma in this 
magnitude is probably significant. At least it seems to be 
beyond doubt that when the effects of the variates indicated are 
eliminated, the flexibility is larger than unity. Using the 
significance factor (19. 4) we may express the result in the form 


(26. 8) 


V (via meat) 
(123456) 


= 1.11 •/•0.94 


This is a figure considerably higher than the average of the 
flexibilities I found in my 1930 study using United States 
budget data. The difference has an interesting explanation 
which will suggest itself when we have subjected butter to a 
similar analysis to that which was just carried through fort 
meat. 

The bunch map for butter is given in Figure 14. Here one 
will immediately be struck by the fact that in the lower sets 
the tightness is poorer than it was in meat, although also butter 
is fairly good even in the lower sets provided 6 is included. The 
need for including 6 is just as marked in butter as it was iiK 
meat, and the reader may himself verify how the inclusion of 
6 is everywhere the important thing that brings order into the 
matter. This is a further corroboration of the ’’isoquant” idea. 

Another striking feature of the map is that the money 
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flexibility measured with butter as a reference commodity 
seems to give a substantially higher value than we got when 
meat was used. Disregarding the changes produced by taking 
in one or the other of the supplementary variates, we may say 
roughly that the money flexibility measured via butter lies 
around 1.4 or 1.5, while measured via meat we found it to 
be 1.1. 

Looking now at the details of the map we see that, adding 
any of the variates 3, 4 or 5 to (126), we get an improvement 
in tightness. Therefore, any of these variates must be con- 
sidered useful. 

But the effects of their inclusion are not the same, 3 is 
definitely flattening the slope (the flexibility in (1236) is 1.24 
as against 1.6 in (126)), 4 is also flattening the slope, but not 
so much (flexibility in (1246) 1.38 as against 1.5 in (126)), 
while 6 has hardly any effect on the slope; 5 has, however, 
a very strong effect on the tightening. If we would only think 
of the tightening effect, we would conclude that it is more 
important to add 6 than 3, but, so far as the correct deter- 
mination of the net flexibility slope is concerned, it is more 
important to add 3 than 5. This is another example of how 
useful it is to work simultaneously with slope and tightness 
criteria as we do in the bunch map analysis (as distinct from 
the analysis by the test parameters of Part II, which only ex- 
press tightness). 

The difference in results thus obtained in the four-sets by 
adding 3, 4 or 5, makes it necessary to consider the fivesets to 
see whether the difference in slope observed in the four-set was 
a gross-slope effect or a multilinear effect. 

Amongst the five-sets, one, namely (12456), stands out. The 
tightness in this bunch is excellent. This is highly significant. 
Indeed, 4 and 5 were the two variates that produced most 
tightening when added to (126). The fact that the tightening 
is still better when both 4 and 5 are added, means that the 
difference in the slopes in (1246) and (1256) was a gross-slope 
effect, not a multilinear effect. A similar conclusion is reached 
in the set (12356), while in (12346) the situation i^ doubtful 
We need not however worry much about (12346). Already a 
comparison between (12356) and (12456) shows that we are 
again confronted with the same question as to whether the 
difference in slopes in these two sets (flexibility 1.44 in the 
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former and 1.55 in the latter) shows gross slope effect or a 
multilineai* effect. We therefore need to consider also the six 
set (123456). This Hg set actually shows the most perfect tightness 
of all the sets considered. This applies not only to the leading 
beams, but also to the secondary beams. There can therefore 
be no doubt that this is the set to be retained. It gives a 
flexibility of 


(26.9) 


V (via butter) 
(123456) 


= 1.40 




0.96 


The whole appearance of the bunch (123456) verifies again 
the conclusions we reached en route'. 3 is flattening and 4 too 
but not so much, while 5 is hardly influencing the slope. 


27. SUBSTITUTION POSSIBILITIES IN MEAT AND BUTTER. THE IN- 
CLUSION OF THE ABSOLUTE PRICE BESIDES THE RELATIVE PRICE. 

The difference in the flexibilitiy results (26. 8) and (28. 9) is 
big in comparison with the limits of significance. The whole 
bunch' map analysis tells us beyond doubt that the money 
flexibility determined via butter is significantly larger than that 
determined via meat. But from theory the two determinations 
ought to be equal. The conslusion is therefore inescapable that 
there must be some factor which theory did not take account 
of, but which has been present in the data and distorted the 
results. 

It will be remembered that the consumption surface theory 
assumed no substitution possibilities between the reference 
commodity and other commodities. The following questions 
therefore naturally present themselves; Does meat stand in 
substitution relation to other goods? Does butter stand in such 
relation? And can such relations explain the difference in 
flexibility measurement obtained? 

Let us see how the consumption surface theory can be 
generalised to take account of substitution possibilities. Let x 
and y be the quantities consumed of two commodities, and let u 
and V be their marginal utilities. If there is no substitution 
possibility between the two commodities, u would be a function 
only of X, and v a function only of y. If u{x) and v(y) denote 
these functions, and p and q are the prices of the two com- 
modities, we would have in the equilibrium of the market 
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(27. 1) 


u{x) _ _ Mr) 

p ~ q ~ P 


Leaving out the middle term of this equation, we get of course 
(26. 1). In other words if u depends only on x and v only on y, 
we just have the case of two independent reference com- 
modities which was the assumption underlying the preceding 
analysis of meat and butter. 

If there exist substitution possibilities, must be looked upon 
as a function both of x and y, and so must v. The equilibrium 
equation now takes on the form 

(27 2) = 

p q P 


Consider for a moment the first of these equations. It defines 
y as a function of x and of the price ratio 

(27.3) X = ^- 

q 

Let this function be 


(27.4) y=/{:r,A). 

Inserting the expression (27.4) for y in the first member of 
(27. 2) we see that the equation of the consumption surface now 
takes the form 


(27.5) iv(?^) = au(x,/(x,l)). 

The function in the right member of this equation is some 
function of x and A which for brevity we denote 

(27. 6) Z7(a:, ?.) = u{Xy f{x, 1)), 

With this notation the equation of the surface of consumption 
takes the form 

(27.7) ?e(r) = a. U{x, X). 

Thus the new element that comes in when substitution possi- 
bilities exist is that the function in the right member pf (26.1) 
must be conceived of as depending not only on x but also on 
the ratio of the price of the reference commodity to that of the 


11 
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other commodity with which it is in substitution relation. The 
isoquant principle must be generalised into a principle of 
constant quantity consumed and constant price ratio to the substitu- 
tion commodity, but with this generalisation the principle 
applies. 

If there are a number of commodities Nos. 1, 2...n with 
which the reference commodity is in substitution relation, we 
need to consider all the equilibrium equations 


u(x, ^ v^{x, .) ^ 

(27. 8) , ^ 

<^ 2 ( 3 ?. ytVi-. ■ ) ^ ^ 

32 P 

where yi ^2 • • ^2 • • •» ^2 • • • quantities consumed, 

the prices and the marginal utilities of those commodities 
with which the reference commodity is in substitution relation. 
Combining the various members in (27. 8) with the first 
member, we get a system of equations which we may assume 
to be solved in the form 

(27.9) (?==1,2...) 
where 

(27.10) , 

In this general c^e the equation of the surface of consump- 
tion would be 

(27. 11) w{7') = a • U{x, I 2 • • •) 

where 

(27. 12) U{x, A 2 . . .) = u\xJM, X , . . ,\Ux, A, ...).. .] 

The formula (27. 11) is a perfectly general formulation of the 
surface of consumption idea, and if the necessary data were 
available, the money flexibility could be determined as 


(27. 13) 


_d log a _da r 
~~ d log r~ dr a 


along any curve on the empirically determined surface (27. 11), 
where x, Aj A, . . . etc. are all constants. Assuming for sim- 
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plicity th€ relations to be linear one may attempt a linear re- 
gression analysis including the X*s as variates. 

It is probable, however, that if an analysis was made accord- 
ing to the above perfect general scheme, it would be found that 
multiply collinear set would frequently occur. 

As an example consider first the extreme case where the 
whole structure of the market is such that the prices of the 
various commodities with which the refeience commodity is in 
substitution relations move (nearly) proportionally to the price 
of tlio reference commodity. In this case the function U in the 
right member of (27. 11) would (nearly) depend only on x. Since 
the isoquant method is independent of the shape of the function 
ii(x) (or U{x)) we see that in this case the correct money 
flexibility would be obtained by applying the ivhole isoquant 
technique exactly as if there had been no substitution possibilities. 
Thus it is only to the extent that the prices of the substitution 
cf)mmodities vary disproportionally to that of the reference 
commodity, that tho effect of the substitution needs to be taken 
account of in the isoquant method of measuring money flex- 
ibility. 

But, of course, if the same data are used to determine the 
elasticity of the reference commodity itself, one would not get 
the correct result by proceeding as if no substitution possibil- 
ities exist. Indeed, the elasticity of the function (27. 12) with 
re.spect so a?, (under constant A 6") is not tlie same as the partial 
elasticity of u(x^ yiVz- • •) with respect to its first variate. 

The case of proportional price movements is only a very 
special case of multidimensional connections between the va- 
riates in the right member of (27. 11). Even without assuming 
any such proportionality it seems plausible — and indeed ne- 
cessary — to take account of the fact that some sort of supply 
relations exist for the commodities between which substitution 
possibilities exist. By taking account of this we shall not only 
protect ourselves against some of the risks of falling into multi- 
collinear situations, but we shall also obtain a theoretical 
scheme which is much simpler and more amendable to statisti- 
cal analysis. 

Let us assume that for each of the substitution oommoditiee 
there exists — besides the equations (27.9) for this commodity 
— some other relation that connects quantities and prices. Let 
this system of relations be 
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(27.14) Si{x,p,yi!/^...,qiq.,...) = 0 (i=l,2...) 

From this system and (27. 9) wo may imagino that both the 
ij’s and the q's are expressed in terms only of x and p 

(27.16) l/i = 9i{x,p) qi=Gt(«,p) (j = 1.2..) 

Inserting this into (27. 8) wo so© that the equation of the sur- 
face of consumption now takes on the form 

(27.16) w[r)^a^ U(x,p) 

wliere 17 is a function only of the two variates x and p, namely 

(27. 17) U(x,p) = u{x, g^(x,p), go(x,p ) . . .) 

In other words, tho only new variate which we now need to 
take account of in the right member of the equation of the 
surface of consumption is the absolut price of tho reference 
commodity. The absolute price is, so to speak, taken as a 
catch-all for all the various possible substitution effects that 
may concern this commodity in its capacity of reference 
commodity for money flexibility measurements. We do not go 
into any specification of which other commodities the reference 
commodity is in substitution connections with. 

The assumption (27. 16) was used on the meat and butter 
data. Tho relation was assumed linear in log p, so that the only 
modification in tho preceding analysis consisted in including 
the variate 


^(meat) ^ j (meat) _ | 4 

(27. 18) 

^.(bntter) ^ y butter) _ j 5 

(1.4 and 1.5 being provisoric means). 

Since (x>ncentric numbering of the subsets had been used, 
the additional tilling work needed could be performed simply 
by elongating tho lists and tables previously used. The result 
of the computations are represented graphically in Figures 13 
and 14 in tiie preceding Section. We shall now consider these 
charts in their entirety, not only the cells belonging to the 
first six variates. We shall go thix>ugh the analysis anew, this 
time taking into account also those subsets where 7 occur. 

We shall take meat first. The three-set (126) in meat is still 



165 


the only one that can be used, and in the four-sets we find as 
before that only those where 6 occur has any organisation. 
This leaves us with only one new four-set to be considered, 
namely (1267). This set is quite remarkable. The organisation 
here is very good. Indeed, it is markedly better than the or- 
ganisation in the best four-sets we had before 7 was added. 
And the change in slope is quite conspicuous. We notice that 
there is a real fight between 5 and 7 in the influence on the 
slope. The inclusion of 5 (the trend factor) had — as we 
noticed it in the preceding analysis — a tendency to increase 
the flexibility, while we now see that the inclusion of 7 (the 
substitution factor) tends to lower the flexibility i. e. to flatten 
the slope: in the set (1267) the flexibility is 0.75 as against 1.03 
in the set (1256). This suggests that substitution does play a 
role, and that it must be taken into account in order to got a 
correct measurement of the money flexibility. Of course the 
variate ([)rice) now considered does not represent the ordi- 
nary demand curve connection between price and quantity, 
that is already taken into account by the variate « ; of'^ represents 
a deviation from the regular demand curve, namely a deviation 
caused by substitution. 

The difference in the slopes in the various four-sets that 
appear significant makes it necessary to consider the five-sets. 
Amongst the five-sets containing 7, (12467) shows an excellent 
tightness. It is suggestive of the importance of 6 to compare 
(12457) with (12467). Further, we see that the fight between 
5 and 7 is also quite marked in the five-sets. In the two 
admissible sets where 6 (and not 7) is present, namely (12356) 
and (12456) the flexibility is 1.11 and 1.06 respectively, while 
in the two admissible sets whore 7 (but not 5) is present, na- 
mely (12367) and (12467) the flexibility is 0.80 and 0.75 respec- 
tively. If the difference between these slopes is a gross-slope- 
effect, it seems probable that the true flexibility must be some- 
where between these values, say between 0.90 and 0.95. It is 
interesting to compare this with the result in the set (12567) 
where hoili 5 and 7 are present (besides (126) which always 
prove to be necessary for a good fit). The tightness of (12567) 
is none too good, but it does show some organisation, and the 
flexibility measured by the diagonal regression is in the range 
0.90 to 0.95. 

Also in the six-sets is the fight between 5 and 7 the domi- 
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Dating feature; compare, for instance, (123466) which contains 
5 (trend) but not 7 (substitution) and which gives the flexibility 
of 1.11, with (123467) which contains 7 but not 6 and gives the 
flexibility of 0.79. If both trend and substitution is taken into 
account, it seems plausible that the flexibility will turn out to 
be somewhere midway between these figures, say 0.95. This 
also checks fairly well with (124567) which contains both 5 and 
7, and gives the flexibility 0.99. The seven-set is of interest 
because it has sufficient tightness to indicate tliat there is no 
’’explosion” and that tlie slope difference observed in the six- 
sets is consequently a gross-slope-effect so that it has a 
meaning to take some sort of an average between them. But 
the slope in the seven-set is not so clearly defined that it seems 
advisable to adopt the diagonal flexibility in this set, as a better 
result than the above average of 0.90 to 0.95. Taking into 
account all available facts, it seems that when the substitutioiii 
possibilities are eliminated, we may put 

(27. 19) - = between 0.95 and 1.00 

with a range of variation of some 5 per cent up or down. 

In conclusion we may thus say that substitution has soniq 
effect — but not a very groat one — in the cjase of meat. The 
money flexibility measui’ed via meat is lotvered when this 
correction is taken into account. 

Now, as to butter. Hei'e there is a high gross cowlation 
between 1 and 7, namely r ,2 ~ — 0.954. This is simply due to 
the fact that the butter price p has fluctuated much more vio- 
lently than the cost of living price P. When this is the case, 
there will of course bo produced a high (negative) cori’clation 
between the variates 1 which is (log P — log p), and 7 which 
is log p — 1.5. We must thei'efore be prepared to find that the 
inclusion of the variate 7 with those already discussed will 
in the case of butter produce a much larger opening of the 
bunches, i. e. a less definite slope indication than in the pre- 
viously considered sets not including 7. That this actually 
happens is seen by a glance at the bunch map in Figure 2 of 
Section 26. The question is whether the change in slope pro- 
duced by the inclusion of 7 is so marked that significant con- 
clusions can be drawn even though the new bunches are more 
open. The whole situation is such as to put our apparatus of 
confluence criteria to an interesting test. 



167 


The new three-set obtained by adding 7, namely (127) does 
not show a very good tightness, but at least so much of it that 
the set cannot be disregarded altogether. The explanation of 
the amount of organisation here found is probably as follows. 
The variate 1 is (log P — log p) and 2 is (log p — log P), p being 
the nominal income; since log P occurs her© in both expressions 
and witli opposite sign, some negative correlation between 1 
and 2 may be expected. There actually turns out to bo very 
little of it (as is seen by the appearance of the bunch in the 
set (12)). This is due to the variation of log p and log p. But 
if the effect of logp is eliminated, as it is in the set (127), the 
fit is iraprove<l. The (12) slope is not — 1 as it would have been 
if it had expressed only the definitional connection due to the 
terms log P and — log P. The slope is flatter, which is 
explained by the presence of log p in the variate No. 2. Indeed 
on account of the whole economic stnicture, p and P tend to 
vary in the same direction, and this will work counter to the 
negative connection between 1 and 2 that is created by the 
terms log P and — log P. The question of whether this may be 
responsible for a ’’spui’ious” element in the determination of the 
money flexibility will be discussed in Section 28. At present we 
shall continue the systematic analysis of the bunch map. 

In the four-sets we notice the same fight between 5 and 7 
as we found in meat: 5 steepens the slope (a little) and 7 
flattens it (quite a bit). It is true that the organisation of the 
four-bunch containing 7 (in addition to the fundamental set 
(126)) is not good (the reasons for this we have already dis- 
cussed). But even though the bunch in (1267) is rather open, 
it is quite obvious that it represents a flatter slope than in 
(1256). In (1267) the diagonal flexibility is 0.80 as against 1.47 
in (1256). The big difference between these two measurements 
makes it necessary to consider the five sets. 

The five- set that contain 5 and 7 (in addition to (126)) is 
markedly tighter than (1267), and the steepening effect is clear: 
5 is therefore useful when added to (1267). In going fro(m 
(1256) to (12567) the change is less clear. There is — as in 
all cases where 7 is added to a set containing (126) — a 
marked opening of the bunch. The change in slope is somewhat 
veiled by the smaller tightness in the new bunch', buft somie 
flattening seems to b© present. The diagonal flexibility in 
(12567) is 1.26 as against 1.47 in (1256). 
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After the inclusion of 7 there is also another feature which 
becomes apparent. 4, and to a still hig-her degree, 3 have a 
tendency to flatten the slope' as compared with 5. This effect 
is very mai’ked: for instance, if 3 is added to (1267) we get a 
diagonal flexibility of 0.63, while if 5 is added, we get 1,26. 
And the bunches, while not very tight, still have so much or- 
ganisation that they cannot be disregarded. All this makes it 
necessary to consider the six-sets. The tightness here is not 
any poorer than in the five-sets, and the pulls of th'e various 
variates 3, 4, 5 and 7 in different directions are still manifest. 
This suggests that as a final attempt wo must go to the big 
seven-set. This final step turns out to bo rather significant. The 
tightness in this bunch is decidedly better than in any of the 
six-sets with tlie exception of (123456). And with regard to 
the passage from this latter set the change in slope is here very 
conspicuous. One would therefore, without hesitation, charac- 
toriso all the variates in the seven-set as useful. The regression 
in this set must consequently be considered as admissible. In 
this regression the diagonal flexibility is 

(27. 20) - - 0.97 •/• 0.75. 

Thus by taking into account substitution possibilities, we get 
sensibly the same results for the money flexibility via meat 
and via butter. Not only that, but the values found are now 
closer to that which I found by using the United States house- 
hold budget of 1918/19. Still there is a good bit of difference, 
the average values on the flexibility curve I found ranged 
around 0.5. The difference may in part be due to the fact that 
I was using exclusively city families, while the present data 
include the United States as a whole. Another, and probably 
more important, hict is, T think, what may perhaps be called 
the ”silk-shirt mentality". The data for my investigations 
were collected in a period of prosperity when incomes were 
inci’casing and people were in an "active" consumption mood: 
they wanted to expand consumption both quantitatively and to 
new categories of goods. In other words, they were just in 
that situation which would give a low money flexibility. The 
data used in the present study covers a period where there 
have been both ups and downs, and certainly not any steady 
"silk-shirt mentality”. 
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28. DEFINITIONAL FLATTENING AND SPURIOUS CORRELATION. 

In Section 7 we considered each observational variate as a 
linear form in certain basic — not directly observable — va- 
riates. What further complications may arise if some of the 
observational variates can be expressed as linear forms in some 
of the other observational variates? To what extent will this 
cause ’’spurious” correlation? I have preferred not to discuss 
this problem in the theoretical part III, but rather to treat it 
in connection v/ith the study of the consumption data: this will 
make the discussion more concrete. 

Consider the two variates 


(28. 1) 


a-i = log P - logi) 
X .2 — log Q - log P 


that occur in the flexibility study, p being the price of the 
reference commodity, P the cost of living and p the nominal 
income. The fact tliat contains log P and contains — log P 
will tend to produce a negative correlation between and x.^. 
Will this cause ’’spurious” results? That will depend on the 
nature of the variability of log p, log P and log q. If there is 
no significant correlation between these latter variates, then 
the observed correlation between Xi and would .have to be 
interpreted as spurious. But if there is some significant connec- 
tion between all or some of the variates logp, log P and log q 
— as there certainly is, for instance between log P, (P being 
the cost of living), and log p, (p being the nominal income), 
these two variates having roughly speaking a tendency to move 
in the same direction — then an observed correlation between 
and X.2 cannot be interpreted only as spurious. It may in part 
be spurious, namely to the extent that log p, log P and log p 
contain outside components, this is to say, components that must 
be looked upon as disturbances in the system of the other va- 
riates considered. But to the extent that the observed correla- 
tion between and X2 is due to the systematic components of 
log p, log P and log p, we cannot consider it as spurious. 

If we did that, we could just as well turn the matter around 
and say for instance: We have 


(28. 2) 


\OgP=Xy -f log P 
loge = r, +arj + log j) 
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and since the observable variate here occurs both in the 
expressions for log P and log Q we must be prepared to find a 
’’spurious” correlation when we observe log P and log q 
together. It is clear that this argument leads in a circle; indeed, 
whatever variates we have given, we can always introduce 
some new variates linearly dependent on the old, and then 
write some of the old variates as linear forms where some of 
the new variates occur. The mere fact that two observational 
variates can be expressed as linear forms in certain other 
observational variates does therefore not in itself constitute a 
’’spurious” element. All dej^ends on the nature of the varia- 
bility of these other variates by which Uie first variates are 
expressed. 

In particular if we have arrived a priori at certain theoretical 
relations whose coefficients we attempt to determine statisti- 
cally, it may well be that some of the observational variates 
entering can be expressed in terms of others. This transforma- 
tion may then only be an alternative way of formulating the 
problem and need not introduce anything ’’spurious”. A 
successful deteniiiriation of the coefficients sought may be 
possible by using either form of the equations. Wether the 
attempt shall meet with success, will depend primarily on the 
actual strength with which the postulated structural relation 
exists in the variates. If this underlying structural relation is 
strong, it will be reflected and may be measured in either form 
of the equations. 

One thing we must look out for, however, is that a number of 
degrees of freedom is left which is sufficient for the regression 
analysis contemplated. By this I mean the following. Quite gen- 
erally let 

(28. 3) 

be a set of variates which are or could be observed. 

In the above flexibility example we could, for instance, 
consider and x.^, and furter log p, log P, log p and log a?, x 
being the quantity consumed, giving a total of 7^=6 variates- 
All of these would of course not be independent, (28. 1) (or 
if we prefer (28. 2)) constitute two independent equations 
between the six variates. Therefore, if we plotted a scatteir 
diagram in 6-dimensions, this scatter would have an unfolding 
capacity of not more than 4. We shall say that this scatter* 
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has a definitional flattening of 2. In addition to this some struC' 
tural flattening may exist, for instance, the one represented by 
the surface of consumption. Quite generally, if between those 
variates that are included in an observed scatter there exists 
by definition (p independent linear equations, we shall say that 
the scatter has a definitional flattening of (p. If in addition the 
structural conditions impose yf further relations between the 
variates, we shall say that the structural flattening is tp. The 
number x = + </; is the total flattening. And it is this total 

flattening that is revealed by the confluence analysis. 

In order that a regression equation fitted to the data shall 
give any information at all about a structural relation present 
in the data, it is of coui^se necessary that the spatter to which 
the equation is fitted has no definitional flattening: indeed, a 
regi'cssion equation has a sense only when the scatter is flatten- 
ed exactly once, and if exactly one definitional flattening is 
present, this is the only thing that will show up in the empirical 
regression equation fitted to the data. A definitional flattening 
is indeed always mathematically exact, while the structural 
flattenings are always somewhat blurred by the presence of the 
disturbances. They cannot therefore compete with the defini- 
tional flattening in influencing the result obtained. If there are 
more than one definitional flattening in the set of those variates 
that are subjected to the regression analysis, the regression 
coefficients will of course be exactly of the indeterminate 

form. 

As exainplee we may notice that there is no definitional 
flattening in the set consisting of the two variates con- 
sidered above, nor is there any such flattening if log x (a; = quan- 
tity consumed) is added to jCj and x^^ indeed even taking into 
account the way in which x^ and x^ are defined (in terms of p, 
P and o) iCj and .Tg and log x may be chosen quite arbitrarily. 
Nor is there any definitional flattening in the set consisting of 
the three variates x^ x.^ and log p. But in the set of four variates 
rTj, .^2, log p and log Q tliere exists a definitional flattening, 
namely the last equation in (28. 2). A regression in this set of 
4 variates would give a mathematically exact fit and would 
therefore be useless for any investigation of structural relations. 

None of the sets actually used for flexibility measurements in 
the preceeding Sections contains a definitional flattening, but 
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conceivably the variability nature of the basic variables in- 
volved may be such that some spurious effect is produced. For 
instance, if it should happen that log- p, log P, log q and log x 
were entirely disorganised without being influenced by that 
systematic connection which we are looking for, namely the 
surface of consumption, then the observed correlation in the 
various and in particular the (12) slope would be spurious, 
caused by the fact that the are linear forms in the variates 
log p, log P. etc. In particular tlie fact that the (12) slope 
happens to turn our rather close to — 1, that is to say equal to 
that value which in view of (28. 1) would come out if log p, log 
P and log p were disorganised, may make us suspicious. 

However, there are various reasons for rejecting this alter- 
native. In the first place it is both for theoretical and empirical 
reasons out of the question to assiume log p, log P and log p 
independent. Koughly speaking there will be a tendency for P 
and p to move together, and also a tendency for p and P to 
move together. From (28. 1) it is therefore seen that what the 
(12) coefficient docs measure is the amount of deviation whicli 
the material shows from the proportionality between F and p 
on the one hand and Q and P on the other. Tliis deviation can 
of course go in either direction. When we have found that it 
docs go so definitely in the negative direction (giving a nega- 
tive (12) slope), this must be due to some structural relation in 
the data. Furthermore, we have in all instances noticed that we 
get organisation in the bunches only by taking into account 
the quantity consumed, which is a variate that has no defini- 
tional connection whatsoever with p, P or q. This is another 
strong indication that the result expresses a structural connec- 
tion, not simply a spurious relation due to the presence of log 
P in iCj and — log P in x.,. 

Finally, we may test the spurious factor in the (12) coefficient 
by the following experimental modification of the data. Let 
and be defined by (28. 1) and let us take exactly those values 
of p, P and Q that were observed in the United States butter 
idata 1919 — 31 and used in the analysis of Sections 26 and 27. 
But let us imagine that the quantity consumed had been diffe- 
rent. Let us put its logarithme equal to 

(28. 4) .^8 = x^~ 0.7 ajg + 0. 12 + c 

where and x^ are defined as above, x>j is log p (as in Section 



173 


(27) and e an eratic variate (lottery drawings) whose standard 
deviation is about 15 per cent of that of The equation (28. 4) 
would now be the modified form of the surface of consumption. 
The term with x^ in (28.4) repi^ents the substitution effect. 
We have chosen the coefficient in front of x.^ t egahve; this 
means that the money flexibility will now be positive, namely 
+ 0.7. Of course, this is a very unrealistic assumption, and it 
is made here only to get a clear cut case whei’e the moniey 
flexibility is definitely different from — 1. The object of the 
experiment is just to see if, by using the same method as in the 
preceding Sections (now with Xg instead of Xq) we obtain that 
value for the money flexibility which would now be correct, 
namely + 0.7 or if we still obtain some value near — 1. The 
result of the computations am given in Figure 15. (The scale in 
1278 is 10 times the scale in 128). 



From these graphs we see immediately that the general slope 
of the (12) connection is now positive. Further, we notice that 
the inclusion of 7 to the set that from theory is the fundamental 
one, namely (128) will also now make the fit poorer. This is 
what we would expect since there exists such a higjh correla- 
tion between 1 and 7. (Both 1 and 7 are, it will be remembered, 
the actual data). The opening of the bunch is, however, not so 
strong as completely to veil the fact that some change in slope 
is produced. By the standard rules for the analysis of the bunch 
map it appears admissible to consider the four-set (1278). In 
this set the diagonal flexibility turns out to be 
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(28. 5) + •/• 0.69. 

It is seen that the ’’true” flexibility according to (28. 4), 
namely +0.7 is amply contained between the upper and lower 
limits of (28. 5), namely + 0.83 and + 0.39. The empirical value 
0.57 is not so very much off the true value, the discrepancy that 
does exist is of course due to the rather heavy erratic com- 
ponent which we have purposely introduced in ccg. 

Since the money flexibility in the determinations of Section 
27 were decidedly negative, and since all the data are notw 
exactly the same as before, except the variate which is now 
denoted as No. 8 and made to fit the artificial surface of con- 
sumption (28. 4), we must conclude that this surface when it 
exists does play a dominating role in determining the observed 
flexibility. It seems that this flexibility as determined by the 
actual data is not due to a spurious connection between and 
.r.j created through the pi’esence of log P and — log P in these 
variates, but is a reality. 

29. SCATTEHANCES AND LINE COEFFICIENTS IN THE 
CONSUMPTION DATA. 

The scaiteranoes in the meat and butter data are given in 
tables (29. 1) and (29. 2) respectively. 

In tables (29. 1) and (29. 2) are indicated also the sums of the 
scatterances on each level. These sums are nothing but the 
coefficients of the characteristic polynomial P(A). These 
coefficients are used amongst other for checking purposes in 
connection with (29. 4). 

Consider meat first. None of the two-sets show any good 
correlation. If the data are to have any meaning in flexibility 
analysis, we must therefore proceed to the three-sets. This is 
also in agreement with the theoretical pattern which requires 
the inclusion of at least (126). 

Amongst the three-sets (126) stand out very conspicuously. 
The scatterance in this set is equal to 0.049, while the smallest 
of the other three-rowed scatterances — that in (157) — is 
0.136. The subscatterances in (126), namely 0.69, 0.73 and 0.83 
are not widely different, but they are so large that it seems 
safe to consider (126) as a promising set according to (III) of 
Section 1. If a three-set is to be accepted, it must undoubtedly 
be (126). 
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TABLE (2«. 1). SCATTERANCES, MEAT 1919—31. 


T;vo 

A 

Three- 

A 

Four- 


Five- 


sets 

sets 

sets 

A 

sets 

A 

12 

0.698223 

123 

0.261244 

1234 

0.088678 

12345 

0.029444 

13 

0.955034 

124 

0.260963 

1235 

0.106864 

12346 

0.005990 

23 

0.432474 

134 

0.396221 

1245 

0.086653 

12356 

0.004493 

14 

0.822088 

234 

0.303500 

1345 

0.192124 

12456 

0.003716 

24 

0.892555 

125 

0.290416 

2345 

0.100775 

13456 

0.051034 

34 

0.714656 

135 

0.477426 

1236 

0.018238 

1 23456 

0 033403 

15 

0.558982 

235 

0.213892 

1246 

0.018074 

12347 

0.041601 

25 

0.536446 

145 

0.453729 

1346 

0.111493 

12357 

0.022068 

35 

0.856850 

245 

0.296388 

2346 

0.108935 

12457 

0.017472 

45 

0.958412 

345 

0.447485 

1256 

0.012920 

13457 

0.045917 

16 

0.738746 

126 

0.049302 

1356 

0.166484 

23457 

0.047471 

26 

0.838212 

136 

0.338659 

2356 

0.108264 

12367 

0.006865 

36 

0.752682 

236 

0.324755 

1456 

0.136980 

12467 

0.006671 

46 

0.383172 

146 

0.282653 

2456 

0.098441 

13467 

0.053184 

56 

0.981904 

246 

0.321039 

3456 

0.167140 

23467 

0.068580 

17 

0.503327 

346 

0.267787 

1237 

0.128173 

12567 

0.002771 

27 

0.912882 

156 

0.370955 

1247 

0.123609 

13567 

0.035001 

37 

0.997420 

256 

0.282881 

1347 

0.190485 

23567 

0.027692 

47 

0.915905 

356 

0.540812 

2347 

0.224222 

14567 

0.028330 

57 

0,992350 

456 

0.366580 

. 1257 

0.062455 

24567 

0.023410 

67 

0.870223 

127 

0.342973 

1357 

0.114387 

34567 

0.128421 

^3 = 

16.312543 

137 

0.470964 

2357 

0.168659 

^6 = 

; 0.683,034 



237 

0.365366 

1457 

0.108869 





147 

0.413727 

A 

2457 

0.141342 

Six-sots 

A 



247 

U.ooDiSisa 

o4Di 

U.O 1 





347 

0.612243 

1267 

0.018567 

123456 

0.001216 



157 

0.136530 

1367 

0.163270 

123457 

0.005820 



257 

0.476831 

2367 

0.204509 

123467 

0.002209 



357 

0.849981 

1467 

0.142228 

123567 

0.000928 



457 

0.877011 

2467 

0.205057 

124567 

0.000749 



167 

0.371832 

3467 

0.212269 

134567 

0.010495 



267 

0.535779 

1567 

0.079731 

234567 

0.007620 



367 

0.602123 

2567 

0.079977 

Ae = 

: 0.029037 



467 

U-oouuyo 

ODOl 






567 

0.852954 

4567 

0.318429 

Seven- 

set 




^3 = 

14.453625 


= 5.008254 

1234567 

0.000240 
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TABLE (29. 2). SGATTERANGES, BUTTER 1919— Bl. 


Two- 


Three- 


Four- 


Five- 


sets 

A 

sets 

A 

sets 

A 

sets 

A 

12 

0.994990 

123 

0.507082 

1234 

0.233744 

12345 

0.01.3862 

13 

0.687733 

124 

0.8.50195 

1235 

0.076096 

12346 

0.012990 

23 

0.787931 

134 

0.318713 

1245 

0.055939 

12356 

0.002742 

14 

0.952541 

234 

0.399923 

1345 

0.167600 

12456 

0.002010 

24 

0.892555 

125 

0.177620 

2345 

0.106256 

13456 

0.027658 

34 

0.507588 

135 

0.446881 

1236 

0.031726 

23456 

0.017060 

15 

0.699055 

235 

0.232571 

1246 

0.047579 

12347 

0.007747 

25 

0.536446 

145 

0.658753 

1346 

0.1.53278 

12357 

0.003087 

35 

0.984998 

245 

0.296399 

2346 

0.110746 

12457 

0.003496 

45 

0.958417 

345 

0.486056 

1256 

0.006566 

13457 

0.005524 

16 

0.736198 

126 

0.073045 

1356 

0.108277 

23457 

0.019409 

26 

0.398256 

136 

0.331648 

2356 

0.063126 

12367 

0.001633 

36 

0.982863 

236 

0.262579 

1456 

0.108763 

12467 

0.003212 

46 

0.929687 

146 

0.559087 

2456 

0.055070 

1.3467 

0.005281 

56 

0..378311 

246 

03.5.5347 

3456 

0.086720 

23467 

0.021658 

17 

0.090008 

346 

0.468855 

1237 

0.029247 

12567 

0.000411 

27 

0.994272 

156 

0.257888 

1247 

0.057532 

13.567 

0.004552 

37 

0.843822 

256 

0.145874 

1347 

0 011595 

2.3567 

0.004519 

47 

0.959484 

356 

0.320887 

2347 

0.299388 

14567 

0.006851 

57 

0.565292 

456 

0.181146 

1257 

0.011280 

i 24567 

0.004150 

_67__ 

0 641913 

127 

0.069050 

1357 

0.021117 

34567 

0.034049 

^2 = 

15.522360 

137 

0.042893 

2357 

0.068889 

A5 = 

: 0.201901 



237 

0..598477 

1457 

0.041493 





147 

247 

0.085693 

0.836323 

2457 

3457 

0.054938 

0.204048 

Six-sets 

A 







347 

0422,533 

1267 

0.004999 

123456 

0.000435 



157 

0.044422 

1367 

0.020266 

123457 

0.000454 



257 

0.163961 

2367 

0.062976 

12.3467 

0.000420 



357 

0.457940 

1467 

0.039623 

123567 

0.000112 



457 

0.537318 

2467 

0.070323 

124567 

0.000122 



167 

0.054503 

3467 

0.192534 

134567 

0.000910 



267 

0.104708 

1567 

0.016320 

234567 

0.001188 



367 

ACil 

0.406681 

2567 

0.012452 

A6 = 

0.003641 



40 4 

U.4bi^uo 

OOO 1 

<0 





.567 

0.207688 

4567 

0.090327 

Seven- 

set 

A 



As = 

11.829945 

^4 

— 2.847511 

1234567 

0.000015 


The four-sets with the lowest scatterances are summarised 
in (29. 3). 
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TABLE (29. 3). LOWEST FOUR-SETS, MEAT. 



1258 

1 1246 

1 1236 

1 1267 

Scatte ranee | 

.0129 

.0180 1 

.0182 

.0185 


.290 

^ .260 

.261 

.041) 

Subscatterances 

.049 

.049 

i .049 

.342 


.370 

.282 

.338 

.371 


.282 

..321 

.324 

.535 


Tli€se four-sots form a group quite distinct fro-m the other 
four-sets, they have markedly lower scatterances. The lowest 
four-r*owed scatteranoe after those listed in (29.3) is 0.062 in 
(1257). It will be noticed that the four-sets listed in (29.3) a.ro 
those obtained by adding 3, 4, 5 and 7 respectively to (126). 
Since (126) had markedly lower scattei’ance than the other 
three-sets, it was to bo expected that the four-sets containing 
(126) would give low scatterances. The interesting thing is that 
actually no other four-set comes in and competes with those 
that derive their good fit from the connection with (126). We 
also see that all these four-sets have a marked spread in their 
subscatterances. All the four-sets listed in (29.3) may therefore 
bo considered as promising. This checks with the bunch map 
analysis where it was found that just these four- sets were the 
ones to be considered. 

in the five-sets a similar effect is found: the 6 sets obtained 
by adding any two of the variates 3, 4, 5 and 7 to (126) form 
again a group by themselves having scatterances lower than 
the others. Again, it is interesting to note that no other set 
comes in. There is also sufficient spread in the subscatterances 
to consider any of these sets promising. This also checks with 
the bunch analysis. 

But when it comes to diseriminating between the 6 promising 
five-sets, the scatteranoe analysis proves inadequate. We then 
need to talio into account finer traits of the data which are not 
revealed by the scatterances. From this point on we must 
leave the scatteranoe and rely on the bunch-map analysis. This 
is a good example of the general rule that the scatterances may 
be useful in indicating roughly the simpler features of the data 
— particularly in cases where these features are very distinctlj^ 
and strongly present. But the scatterances must not be pressed 
to give information about the finer features: they will then 
only give nonsensical information. 
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The most conspicuous difference between butter and meat is 
that in butter we find already a two -set with a small scatter- 
ance, namely (17). The reason for tliis was discussed in Section 
28. It is mainly a spurious correlation and should not be taken 
as an expression for a structural relationship. The second 
lowest two-set in butter is (56) with a scatterance of 0.378. 
This simply expresses the fact that there is a strong trend in 
the butter consumption. The third lowest is (26) with a scatter- 
ance of 0.398. ^lore precisely the variates 2 and 6 move in the 
same direction, being positive. This can probably be inter- 
preted as a structural relation, namely tlie Engel curve for 
butter: Butter consumption increases iis income increases. 
Of course, the connection as exhibited in the gross correlation 
between these two variates is not perfect, due to the influence 
of other factors. The main conclusions fi‘om the study of the 
three and higher rowed scatterances for butter are similar to 
those for meat; the reader^ can himself easily carry this analysis 
through. 

Table (29. 4) gives the line coefficients for l)utter. The two- 
rowed line c*x)cfficients are the same as the scatterances and 
iKicd therefore no further comment. 

The three-iewed line coefficients indicate (126) as the best 
set. This is remarkable. The line coefficient in this set is even 
smaller than are that in (127) and in the other sets containing’ 
tlio spuriously connected variates 1 and 7. Amongst the four-sets 
the best is (1256) with a line coefficient of 0.16. Its sub-line 
coefficients are 0.33, 0.15, 0.73 and 0.77. These show a con- 
siderable spread and some are not small: (1256) is therefore a 
decidedly promising set. This checks with the bunch analysis 
which, it will be remembered, gave best tightness in (1256). 

If we want to go from the four-sets to the five-sets, we see 
from table (29.4) that we must accept a marked increase in the 
line coefficient. Indeed, the lowest five-rowed line coefficients 
range about 0.25 as against 0.16 in the four-sets. This means 
that even the best five-set is such that the average tightness in 
its various bunches is poorer than in the set (1256). Whetheir 
it is nevertheless admissible to consider some of these sets 
will depend on the strength of the slope changes. We have 
therefore here reached a point where the analysis cannot be 
pushed any further, purely on the basis of line coefficients. 
From this point on we have to rely on the bunch map. 
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TABLE (29.4). LINE COEFFICIENTS, BUTTER. 


Three- 

sets 

Four- 

sets 

Five- 

sets 

123 

0.75678 

1234 

0.73496 

12345 

0.39597 

124 

0.04956 1 

1235 

0.54154 

12346 

0.44792 

134 

0.64270 1 

1245 

0.21205 

12356 

0.25537 

234 

0.77093 j 

1345 

0.59956 

12456 

0.26105 

125 

0.33214 

2345 

0.67764 

13456 

0.72467 

135 

0.72594 ' 

1236 

0.31049 

23456 

0.76808 

235 

0.40395 

1246 

0.21117 

12347 

0.25577 

145 

0.87881 

1346 

0.70943 

12.357 

0.55064 

245 

0.48107 

2346 

0.60937 

12457 

0.69146 

345 

0.75535 

1256 

0.16191 

13457 

0.30516 

126 

0.15412 

1356 

0 71003 

23457 

0.44367 

136 

0.52205 

2356 

0.75774 

12367 

0.50031 

236 

0.53238 

1456 

0.53301 

12467 

0.61122 

146 

0.73847 

2456 

0.65367 

13467 

0.29923 

246 

0.68533 

345() 

0.53726 

2.3467 

0.47.551 

346 

0.74637 

1237 

0.29905 

12567 

0.52052 

156 

0.73153 

1247 

0.28732 

13567 

0.41245 

256 

0.77147 

1347 

0.20054 

2.3567 

0.31695 

356 

0.56212 

2347 

0.80703 

14567 

0.64992 

456 

0.33853 

1257 

0.61687 

24.567 

0.44963 

127 

0.17730 

1357 

0.28626 

34567 

0.70.898 

137 

0.17918 

2357 

0.44407 



237 

0.78496 

1457 

0 .34145 

Six-sets 



147 

0.23671 

2457 

0 23847 

123456 

0.28246 

247 

0.93000 

3457 

0.72540 

123457 

0.61304 

347 

0.73485 

1267 

0.57753 

12.3467 

0.56814 

157 

0.26183 

1367 

0.38595 

123567 

0.64833 

257 

0.35168 

2367 

0.45742 

124567 

0.52312 

357 

0.73664 

1467 

0.41675 

1.34567 

0..50627 

457 

0.80447 

2467 

0..32958 

234567 

0.486,50 

167 

0.27406 

3467 

0.80264 



267 

0.24353 

1567 

0.49755 

Seven-set 


367 

0.60943 

2567 

0.30720 

1234567 

0.60570 

467 

0.66595 

3567 

0 78466 



567 

0.76023 

4567 

0.58951 




The line coefficdents in the two-.sets are simply the soatterajices g^iveii 
in (29.2). 


30. EMPIRICAL DISTRIBUTION OF SCATTERANCES IN AN EIGHT-SET 
OF RANDOM VARIATES. 

For various purposes it is useful — as a standard of com- 
parison — to know the decline that takes place in the scatter- 
ances as we pass to higher sets in the case where no systematic 





180 


connections exist between the variates. I have therefore had 
computed the scatterances in all possible subsets of 3 eight- 
rowed correlation matrices formed on the basis of random data. 
The variates were constructed by lottery drawings. The 
number of observations from which the correlations were 
formed was N = 100. As an example the cumulated frequency 
distribution of the four-rowed 
scatterances contained in one 
of these matrices (No. Ill) is 
given in Figure 16. The com- 
plete list of the above mention- 
ed scatterances computed with 
six decimal places are avail- 
able at the University Institute 
of Economics, Oslo. A copy 
will be sent on request. 

The median values and quartiles of the empirical distributions 
of these scatterances are given in tables (30. 1) — (30. 3). 



Fig. 16. 


TABLE (30.1). MEDIAN OF THE EMPIRICAL DISTRIBUTION OF SCAT- 
TERANGES IN AN EIGHT-SET OF RANDOM VARIATES. 


Set 

Matrix No. 

I 

! 

III 

2-rowed 

0.991 

0.995 

0.995 

H » 

0.967 

0.972 

0.976 

4 » 

0.925 

0 949 

0.933 

5 » 

0.867 

0.908 

0.888 

G » 

0.797 

0.853 

0.827 

7 » 

0.731 

0.788 

0.771 

8 » 

0.675 

0.746 1 

0.715 


TABLE (30.2). 


Set 

Lower quartiles in matrix No. 

Upper quartiles in matrix No. 

1 ■ 

II 

III 1 

1 . 

II 

HI 

2-rowed 

0.979 

0.984 

1 

0.986 

0.998 

0.998 

0.999 

3 » 

0.939 

0.964 

0.954 

0.976 

0.985 

0.984 

4 » 

0.885 

0.927 

0.900 

0.948 

0.960 

0.956 

5 » 

0.829 

0.874 

0.859 

0.905 

0 927 

0.911 

6 » 

0.776 

0.816 

0.811 

0.844 i 

0.880 

0.856 

7 » 

0.705 

0.776 

0.759 

0.751 1 

0.816 

0.785 
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TABLE (30. S). AVERAGE OF CHARACTERISTICS FORMED IN ABOVE 
3 MATRICES. 


Set 

Median 

Lower quartile 

Upper quartile 

2-rowed 

0.994 

0.983 

0.998 

3 . 

0.972 

0.952 

0.982 

4 » 

0.936 

0.904 

0.955 

6 » 

0.888 

0.854 

0.914 

6 » 

0.826 

0.801 

0.860 

7 . 

0.763 

0.747 

0.784 

8 * 

0.712 




Of course, these distributions do not represent the sampling] 
distribution of independent scatterances. For instance, some 
of the four-rowed scatterances whose distribution is given in 
Figure 16 are connected because they are all contained in one 
and the same eight-rowed matrix. The thing that interests us 
from the confluency view -point is this kind of distribution more 
than the distribution of independent scatterances. 

Some information about the distribution of independent 
scatterances may however also be derived from the above data. 
In Table (30. 4) are listed the actual values of some scatteran- 
ces that are not connected. 


TABLE (30. 4). INDEPENDENT SCATTERANCES OF RANDOM VARIATES. 


Number of 
variates 


( Matrix No. 

Set 

I i 

II 

III 


12 

.982556 

.999261 

.991714 

2-rowed 

34 

.980425 

.981367 

.999166 

56 

.998239 

.973878 

.998607 


78 

.997071 

.981084 

.997393 

3-rowed 

123 

.975821 

.915083 

.937775 

456 

.962942 

.963738 

.987513 

4-rowed 

1234 

5678 

.949019 

.895032 

.887814 

.926488 

.932655 

.942310 

5- rowed 

12345 

.926138 i 

.851324 

.924423 

6-rowed 

123456 

.871833 

.816179 

.889732 

7-rowed 

1234567 

.705379 j 

.776624 

.811419 

8-rowed 

12345678 

.676130 

.746849 

.714657 
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TABLE (30.5). CHARACTERISTICS OF THE EMPIRICAL DISTRIBU- 
TION OF SCATTERANCES OF RANDOM DATA. (For each characteristic 
several independent determinations were made by means of the computations 
in the 3 matrices I, II and III alx)ve. For instance each median given 
in the first column of the present table is an average of several 
determinations. Similarly for the quartiles.) 


Dimensionality of 
the hlf,^ matrix 
containing: the 
scatterances 
whose distribu- 
tion is souffht 

Dirnension- 
alily of the 
.scatterances 

Median 
value 
of the 
scatter- 
ances 

Lower 

quartile 

Upper 

quartile 

Number of 
independent 
determina- 
tions of each 
characteristic 

Lowest 
value 
found 
for the 
median 

IIlKhest 
value 
found 
for the 
median 

2 

2 rowed 

.990 

Same as median 

12 

.974 

.999 

3 

2 


.989 

.973 

.995 

6 

.965 

.999 

3 

1 

.958 

Same us median 

6 

.915 

.995 


2 


.993 

.977 

.998 

6 

.986 

.998 

4 

3 

» 

.9G2 

.942 

.981 

6 

.944 

.974 


4 


.922 

Same as median 

6 

.888 

.949 


2 

» 

.994 

.966 

.998 

3 

.992 

.997 


3 

» 

.977 

.948 

.984 

3 

.970 

.986 

5 

4 

» 

.932 

.992 

.964 

3 

.888 

.955 


5 

> 

.885 

Same as median 

3 

.839 

.927 


2 


.995 

.985 

.998 

3 

.993 

.997 


3 


.977 

.962 

.985 

3 

.970 

.985 

6 

4 


.945 

.918 

.962 

3 

.933 

.956 


5 

» 

.902 

.881 

.931 

3 

.863 

.929 


« 


.859 

Same as median 

3 

.816 

.890 


2 

» 

.993 

.982 

.998 

3 

.991 

.995 


3 


.908 

.951 

.980 

3 

.9(52 

.973 

7 

4 

» i 

.928 

.900 

.954 

3 

.907 

.940 

5 


.880 

.848 

.911 

3 

.834 

.904 


6 


.818 

.802 

.849 

3 

.778 

.859 


7 


.764 

Same as median 

3 

.705 

.811 


2 

» 

.994 

.983 

.998 

3 

.991 

.995 


3 

» 

.972 

.952 

.982 

3 

.967 

.976 


4 

» 

.936 

.904 

.955 

3 

.925 

.949 

8 

5 

9 

.888 

.854 

.914 

3 

.867 

.908 


6 

9 

.826 

.801 

.860 

3 

.797 

.853 


7 

» 

.763 

.747 

.784 

3 

.731 

.788 


8 

> 

.712 

Same as median 

3 

.675 

.746 
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The material that aerved to detennine (30. 3) may of course 
also be utilised to find the corresponding medians and quartiles 
for the distribution of scatterances contained in a matrix of 
lower dimensionality than eight to which (30. 3) refers. Indeed, 
if we leave out all those scatterances that contain the affix 
No. 8 we obtain the corresponding distribution of scatterances 
contained in a seven-set. Similarly, one may leave out two 
affixes, and thus determme the distribution of scatterances 
contained in a six-set, etc. The result of these computations are 
given in table (30. 5). 

:n. AN r:XAMPLE in 8 variates from the new ENGLAND 
POTATO MARKET. 

The examples of Sections 26—29 gave a rather close fit anti 
permitted to draw — as it seems — significant conclusions 
about the intercoefficionts in question. Wo shall now discuss 
an example where the fit is much poorer; its main Interest will 
be to illustrate how the bunch-analysis technique permits us to 
obtain at least that little' regression information which can be 
squeezed out tlio data. The example is the potato data for 
which correlation coefficients are given in Section 2. 

The complete set of scatterances in this example was com- 
puted. In table (31. 1) are given the lowest and highest values 
as well as the medians for each dimensionality of the scatter- 
ances. (A complete list will be sent on request). 


TABLE (31. 1). SCATTERANCES IN THE POTATO DATA. 


Set 

Median value of 
scatterance 

, Lowest Bcatter- 
ance found 

Highest scatter- 
nnce fount! 

2-rowed 

.954 

.795 

.999 

a » 

.848 

.626 

.984 

4 » 

.708 

..536 

.893 

5 » 

.619 

.439 

.774 

6 » 

.473 

.367 

.625 

7 . 

.369 

.309 

.425 

8 » 

.283 
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It is quite obvious that the values of the scatterances in the 
potato data are distinctly lower than for random variates. A 
glance at (31. 1) as compared with (30. 3) shows this immedia- 
tely. Therefore some sort of organisation is undoubtedly pre- 
sent, but, of course, it may not be good enough to permit tioi 
point out exactly how the various factors influence price. We 
see for instance that even in the higher sets of the potato data 
the organisation is very much poorer than in the meat and 
butter data studied in the previous sections, e. g. the lowest 6- 
rowed scatterance in the potato data is decidedly higher than 
the lowest 6-rowed scatterance in meat and butter. Although 
the scatterances may not be able to discriminate between 
certain difficult cases of multiple collinearity — - as we have 
seen — yet they do give an unfailable neeessanj criterion: They 
nmst be small if any good regression equation shall exist. There 
is therefore no hope of getting a nice looking bunch map for the 
potato data. 

The reason for this poor fit, even in the higher sets of the 
potato data, must be that one or more variates which are 
highly important for the determination of the price are not 
present amongst those here considered. One such variate that 
suggests itself is the quantity brought to the market: un- 
fortunately exact information about this variate was not avail- 
able in the present case. 

To see if it should not be possible nevertheless to squeeze! 
some information out of the data, a complete bunch map was 
plotted for all sets up to and including the four-sots. 

A first test that was made on this map was to see what 
happened if the variate No. 1 (price) was added to any of the 
two-sets. It turned out to be either detrimental or superfluous 
(according to the criteria of Section 17) in all cases except 
when added to the set (24) and (25); in these two cases it 
turned out to be useful. This information we shall now try to 
verify in other ways. There are four variates involved in the 
above preliminary conclusion, namely. (1245). Let us see what 
happens if we add one of these variates to any of the two-setsi 
contained in this four-set. Doing this we get the following 
star map. 
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TABLE (31.2). STAR MAP FOR THE THREE-SETS IN (1246). 


Adding 

Variate 

No. 

1 To the two-sets 

12 

14 

1 24 

1 15 

25 

1 46 

1 



* 


* 

O 

2 


* 


* 1 


o 

4 

* 

* 


• 

O 


5 

* 

• 

* 





In the first place we see that any of the 2f three- sets (124i)i 
or (125) seems promising. Indeed, adding 1 to (24), 2 to (14) 
or 4 to (12) we get a useful variate (marked with an asterisk 
in (31. 2)). The same is true if wo add 1 to (25), 2 to (15) or 
5 to (12). But if we add 4 to (15) or 6 to (14) we get an indica- 
tion of a detrimental variate. Finally, both 1 and 2 are indi- 
cated as superfluous in addition to (45). This in connection with 
the fact that we have the following values of the original 
correlation coefficients 


(31.3) 


ri 2 = — 0.2108 

= - 0.4526 
^24 = 4 - 0.0145 
0.3152 
= — 0.0547 
^-46 = + 0.4436 


suggests the following conclusion: The price 1 depends essen- 
tially on the two qualities 2 and 4, these two qualities being! 
practically uncorrelated in the material at hand, the set (24) 
displaying the smallest gross correlation of all the two sets. 
But on the other hand there is a close relation between 
the two qualities 4 and 5, so that we may also as an alternative 
consider to express price as a function of the two qualities 2 
and 5. But we must not make an attempt to express price si- 
multaneously as a function of 4 an 6. The alternative (124) 
seems to be slightly better than (125). 

The above conclusion is corroborated by the following further 
features of the bunch map. All the bunches in the three-seitsj 
that include price indicate that the price depends negatively 
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on both the two qualities involved. If the bunches in any such 
three-set should show compatible signs, the bunch exhibiting 
the intercoefficient between the two qualities ought therefore 
to indicate a negative slope. This is the case in the sets (124), 
and (125) hut in none of the other three-sets; in particular the 
signs are not compatible in (145). 

Finally, the star map for the four-set (1245) is 


TABLE (31.4). STAR MAP FOR (1245). 


Vnriate added 

Intercoeffident between variate No.s. 

12 

14 

24 1 

15 

26 

45 

1 


O 

O 


1 ° 

• 

2 



1 O 


O 

4 

O 



• 

• 


5 

" 

o 

• 

I 




If the fit in the three sets had been better, there would have 
been produced more of an ’’explosion” by passing to the four- 
sets, but even as it is (31. 4) indicates clearly that the set (1245), 
should not be accepted. In other words, the two three sets (124) 
and (125) must be kept distinct. 

In normalised coordinates the diagonal regression equation 
in (124) is 

(31.5) = - 0.892 -0.978 

(price) (size) (colour) 

Of the two variates in the right member of (31.5) the above 
criteria indicate (colour) as the most important. 

The concrete knowledge which we have of the data indicateis 
that the result summarised in (31.5) is quite plausi,ble. It is 
at least a more plausible result than the one which seemed to 
follow from the ellipsoid method of Section 2. 

If we add the significance factor S, the regression coefficients 
of (31.5) can be written 


(31.6) 


— 0.892 •/• 0.229 and —0.978 •/• 0.460. 
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It is interesting to compare the above final conclusion with 
the behaviour of the scatteranoes. The smallest scatterance 
in the three-sets is that in (145) which is equal to 0.62. If we 
should let ourselves be guided only by this, we would be led: 
to adopting just that three-set which by the bunch analysis 
has been characterised as a dangerous inadmissible set. In 
other words, we just have the second — the dangerous one — 
of the two alternatives that were discussed in Section 1 in 
connection with the interpretation of small scatterances. The 
subscatterances in the set (145) are 
(31.7) 0.80, 0.90 and 0.80 

In order to arrive at that inteqDretation of the scatterances 
which we now — from the bunch analysis — know is the 
correct one, we would have to be sio conservative as to tate 
the three numbers in (31. 7) as essentially equal. Possibly the 
whole appearance of the data might have led us to such a 
scepticism, but at any rate it is evident that the complete 
bunch analysis furnishes a much more conclusive technique. 

32. CONFLUENCE ANALYSIS AS A MEANS OF DETERMINING TREND 
PARABOLAE AND OTHER CURVE FITTINGS. 

A problem that is frequently encountered in time series ana- 
lysis is to fit a polynomial as a ’’trend” in a given time series. 
The question then arises as to how high a degi’ee one should 
take in the polynomial. The confluenoe analysis technique 
may be a help in answering this question. Of course, strictly 
speaking, a number of terms representing different powers of 
the time- variate can never be linearly dependent, but it may 
be that the afjproximate linear connection that exists between 
such terms over a short interval makes the whole fitting appa- 
ratus so much more sensitive to the random disturbances that 
it would be better to be satisfied with a smaller number of 
terms. The point where to stop may then be decided by con- 
sidering the given time series (t denoting time) as well as 
the successive powers of ^ as so many variates to be thrown 
together in a regression analysis and scrutinised by a bunch 
and star map analysis. In practice it is usually better to use 
binomial coefficients in t instead of powers. 

More generally the same procedure may, of course, be applied 
if it is wanted to fit to a given series a linear form in any 
sequence of prescribed functions. 
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The bunch analysis may also be applied in testing' the signi- 
ficance of the information obtained by successive laggings of 
a given time series. This is equivalent to using the confluence 
technique in order to determine the significance of the coeffi- 
cients of a difference equation which the given series is 
assumed to satisfy. The confluence analysis would tell how 
high order of the difference equation it would have a meaning 
to consider. 

More generally the same kind of analysis may be applied to 
test the significance of the information obtained by applying 
to a given series a number of different linear operations (mov- 
ing totals with different weight* systems). 


33. THE INADEQUACY OF THE CLASSICAL SAMPLING THEORY AS A 
MEANS OF TESTING LINEAR CONFLUENCY. 

In concluding the present investigation it may be interesting 
to compare the results obtained by bunch analysis with that 
obtaine<l by the classical sampling error approach in a case of 
linear confluency. 

As an example I select the constructed example in Section 
23, where we actually know the composition of the data and 
thus can see directly which method leads to an approximately 
correct result and which one gives nonsense results. 

The classical sampling error method of testing the ’’signifi- 
cance” of the coefficients in a regression equation is as follows. 
If the number of observations is small, the coefficients have to 
be tested by ”Student”s distribution. But if the number of 
observations is fairly large — as it is in the constructed 
example where iV=100 — there is only a negligeable difference 
between the result obtained by using ”Student”s distribution 
and that obtained by simply computing the ordinary standard 
errors of the regression coefficients. In the present case it 
will amply suffice to consider the ordinary standard errors. 

The formula for the elementary regression coefficient of the 
non-normalised variate Xi on Xj taken within the set ... y) is 


(33. 1) 




^IjiaP ...7) 

...r) 


where the ms are the elements of the adjoint moment matrix 
in the set (a/J . . . y). 
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The standard error of (33. 1) according to the classical for- 
mulae ^ is 


(33. 2) 


^ . . .y)] 



V . . .y ‘ Aa/9 . . . . y 

Art^. . .)t( . .. y 


where the A*s are the scatterances and the niii~[xiXf\ the sum 
squares of the variates extended over all observations. The 
number N' is the corrected number of observations (the degrees 
of freedom), namely 

(33.3), N' = N-v 


where N is the actual number of observations and v the number 
of variates in the regression equation, i. e. the number of affixes 
in the set (a/J . . . y). 

If the regression equation is written in normal coordinates, 
the regression coefficient of Xi on Xj in the set (a^ > . . y) he 


(33. 4) ^ 

V ...y) 

where the f-s are thje elements of the adjoint correlation 
matrix in the set (a/? . . , y), in other words, they are the ele- 
ments in the tilling table for the siet («/?.. . y). When the tilling 
'technique is used, it is moat convenient first to compute the 
regression equations in normalised coordinates, and then — if 
wanted — to pass to the coefficients b by means of the first 
equation in (33.4). 

The standard error of (33.4) is times the standard 

error of (33. 1), that is 

* See for instance Yule: An Introduction to Statistics, chapter XVII, or 
Ezekiel: Methods of Correlation Analysis page 258. The standard errors of 
the regression coefficients are usually given by means of recurrence formulce 
to be applied in connection with certain numerical computation schemes. For 
our purpose it is better to consider the explicite formula (33. 2). This formula 
is even to be preferred in actual computation whenever the scatterances are 
available (for instance through the complete tilling). 
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(33. 5) 


^ . . Y) 



. . . r . . .)ij' ■ . . r 

... i(. . .y 


The formula (11.15) shows that in. any subset (a/3. , . y) wliich 
— so far as the systematic components of the variates is con- 
cerned — is multiply collinear, both the scatteranoe and the 
subscatterances will have the disturbances as their principal 
terms. In other words, they will be meaningless from the view- 
point of the regression equations studied. In any multicollincai’ 
set the standard errors (33. 2) and (33. 5) will therefore in point 
of principle be of the indeterminate form, and in practice 
they will have to be looked upon as numbers drawn at random. 
In other words, they will be entirely meaningless as tests of 
the "significance” of the regression coefficients. If we never- 
theless use them we actually take a number drawn out of one 
hat as an expression for the "significance" of some other 
number drawn out of another hat. 

That this is actually so is verified in a striking manner if wo 
compute the elementary regression coefficients and their stan- 
dard errors in the set (1234) of the constructed example of 
Section 23. The results are given in (33. 6) and (33. 7). In 
these tables the regiession coefficients are printed in large 
types and the corresponding standard errors in small types. 
The regression coefficients in table (33.6) are obtained simply 
by reducing tlie elements on each row in the four-rowed tilling 
table 3 in Section 23, in such a way that the diagonal element 
becomes equal to — 1. The other elements in that row will 
then be the regression coefficients (33. 4). 


TAHLE (33.6). ELEMENTARY REGRESSION COEFFICIENTS AND THEIR 
STANDARD ERRORS IN THE SET (1234) IN THE CONSTRUCTED 




EXAMPLE. 




1 

2 

8 

4 

1 

— 1.0000 

— .1126 

.7214 

.6596 


.0000 

.1017 

.0692 

.0767 

2 

— .1120 

— 1.0000 

.7408 

— .6593 


.1012 

.0000 

.0667 

.076.8 

3 

.7362 

.7599 

— 1.0000 

.0173 


.0700 

.0684 

.0000 

.1031 

4 

.6607 

— .6637 

.0169 

— 1.0000 


.0768 

.0768 

.1012 

.0000 
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TABLE (33. 7). ELEMENTARY REGRESSION COEFFICIENTS AND THEIR 
STANDARD ERRORS IN THE SET (12346) IN THE CONSTRUCTED 
EXAMPLE. 



1 

2 

8 

4 

5 

1 

— 1.0000 

— .1417 

.7414 

.6423 

.0153 


.0000 

.1048 

.0714 

.0780 

.0189 

2 

— .1328 

-1.0000 

.7554 

- .6341 

.0350 


.0082 

.0000 

.0648 

.0745 

.0181 

3 

.7162 

.7788 

— 1.0000 

.0366 

- .0343 


.0600 

.0668 

.0000 

.1008 

.0183 

4 

.6472 

— .6817 

.0382 

— 1.6000 

.0112 


.0786 

.0801 

.1046 

.0000 

.0140 

5 

.8137 

1.9924 

— 1.8936 

.5932 

-1.0000 


,7482 

.7451 

.7357 

.7424 

.0000 


If the standard errors should be reliable warning signals, 
they ought to tell us to keep away from any of these regression 
equations. Indeed, from the way in which the example was 
constructed we know that not a single one of the regi’ession 
coefficients in the tables (33. 6) and (33. 7) has a meaning. 

The minimum requirement which the standard errors must 
satisfy in order to bo such a warning signal is that for any 
given regression coefficients the standard error must be equal 
to at least one-third or a quarter of the absolute value of the 
egression coefficient in question. Otherwise we would conclude 
that a least the sign of this regression coefficient is significant. 
How is this fulfilled in the tables (33. 6) and (33. 7) ? It is very 
far from fulfilled. Take for instance the first equation in (33. 6). 
Here we have 0.7214 and its standard error 0.07. In other 
words, the standard error is less the one- tenth of the regression 
coefficient. And for the next regression coefficient the 
standard error is about one-tenth. No statistician who is used 
to working with standard errors would hesitate to conclude that 
the last two regression coefficients are significant. At least he 
would conclude that it is practically certain that both these 
coefficients are positive. From the way in which the example 
was constructed we know that this is sheer nonsense; a re- 
gression equation in the set (1234) has indeed no meaning at all., 

In the second equation of (33. 6) we have a similar situation. 
One would here conclude for instance that § 2 ^ is significantly 
positive and significantly negative, and so on. 

In view of the fact that the standard errors are now to be 
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looked upon as numbers drawn at random, w© would expect 
that, roughly speaking, one-half of them will actually do the 
i'ight thing, namely to warn us against the regression coeffi- 
cient in question, and the other half would do the wrong thing, 
namely not to warn us. Disregarding the diagonal elements 
that are by necessity equal to — 1, we have in (33. 6) 12 stan- 
dard errors of which 8 do the wrong thing. In (33. 7) we may 
disregard the coefficients in the last column where — as w© 
have previously seen — a persistency effect is present. This 
leaves us with 16 coefficients of which 8 do the wrong thing. 

The bunch analysis of Section 24, it will be remembered, fur- 
nished the correct criteria for the nonsense of the regression 
equation in the sets (1234) and (12345), and did it with such 
distinctness that there could be no doubt about the conclusi- 
veness of tlie result. 

I do not claim that the technique developed in the present 
paper will, like a stone of the wise, solve all the problems of 
testing ’’significance” with which the economic statistician is 
confronted. No statistical technique, however refined, will ever 
be able to do such a thing. The ultimate test of significance 
must consist in a netword of conclusions and cross checks 
where theoretical economic considerations, intimate and rea- 
listic knowledge of the data and a refined statistical technique 
concur. But I do claim that the technique here presented will 
in a great number of cases be very helpful. I would even 
venture to say that for many kinds of problems it will be 
indispensable — until something better is found that can 
replace it. 


STOCKHOLM 1934, KLARA CIVILTRYCKERI A.-B. 








